ESnet Site Coordinating Committee

October 20, 1998

ESSC Steering Committee - Larry Price

The chairman of the ESCC is Bill Wing. He replaces the outgoing chairman Roy Whitney.

The review of ESnet occurred at Berkeley in April. The core services are excellent in quantity & quality. They highly endorse the network monitoring and the first recommendation is to do more network monitoring, including more inside ESnet. They recommended a 3 yr. implementation plan to be updated semi-annually. They want a strategy for network research, evaluating new technologies, agree university connections are a problem and urged cooperation with I2/vBNS. International connectivity is inadequate and should get more money. The needs of NGI, international technology and differential services were all agreed to need more focus.

DOE Update - George Sewernyiak

Bill Richardson replaces Pena. Aiken to Cisco. Dan Hitchcock no longer acting director, he is now a program director. Tom Rowlett on temporary detail working on the DOECN SIM. Dave Nelson gone temporarily to NASA. Pat Dehmer now in charge.

For interagency activities the Large Scale Network (LSN) is addressing NGI issues and recently the Joint Engineering Team (JET) to focus on technical issues and the IST has been set up to look at security. DOE does not participate in the NGI today. The LSN is co-chaired by George Strawn of NSF and Dave Nelson of NASA. The Internet Security Team (IST) to work with users to implement security improvements & standards, coordinate with other LSN teams. The JET is focussing on connectivity to Alaska, improved international connectivity and NGI issues as requested by the LSN. Lab participation is welcomed.

There is no DOE NGI funding for FY 1999. Can only do as part of program. There is an omnibus in front of the president to to restore $15M to DOE budget for NGI. Late breaking news: on 10-22-98 "An Additional $15 million was provided in the Science appropriation to participate in the Next Generation Internet program. The Department is directed to award funds under this program using full and open competitive procedures." . For FY2000 there is a $17M OMB request for NGI ($10M network, $4m Universities ...) Want to connect better to I2.

The Strategic Simulation Initiative (SSI) plan for FY 2000 see http://www.doe.gov/ssi/.

ESnet program review gave an outstanding rating. Actions needed in the network monitoring, providing a long range implementation plan, improved university connectivity, and international connectivity. In 1998 the ESnet program report was completed early CY98, the LSN workshop was complete Jan 98, the ESnet pgm plan completed etc. The Sprint contract has been re-negotiated.

ESnet FY99/00 goals include: looking at extra funds for international bandwidth; there is additional emphasis on net monitoring; DOECN (DOE corporate network)

Request for FY99 was $14.5M, but $1.1M short.

DOE has to participate in the SBIR program. All programs are taxed ($2M) to fund this. This year he added 2 categories on computing and networking. Got 40 proposals. Awarded $600K phase I. See http://sbir.er.doe.gov/sbir/. He showed 8 SBIR companies, they include Architecture Technology Corporation automatic LAN problem diagnosis & repair & NetPredict. George can provide details upon demand.

Network requirements will outstrip the ability to fund bigger pipes especially for international pipes. Need to look at closer working with other agencies and Internet 2 to improve university connectivity. We also need to be proactive and solicit network requirements for SSI & ASCI.

ESnet Update - Jim Leighton

Traffic continues to grow. Bytes accepted went from 5.92 TB in Sep-97 to 10.9TB in Sep-98. A few new sites have been added. The Seattle hub is a possible collaboration with PNNL and NASA. They are awaiting for an expression  of interest from NASA which will provide the justification for an OC3 between ESnet and Seattle hub. Another possible connection would be using NTON.

ANL & LBNL are currently under test at OC12c. Low level corruption (1.2E5). This is not justified by existing traffic. It is easy to do. OSTI upgraded to OC3c. Savannah River has requested T3 or OC3.

Need to shift emphasis from vBNS since contract expires Mar '00. Future focus will be on peering with Abilene by connecting to the Internet-2 GPoPs to enhance peering. Existing peering points are at Chicago (OC3 between ESnet & STARTAP), maybe SDSC and Perryman (Jim does not think the Perryman one will fly). The next connection will be at UCB (it is almost "free"), will be OC12/IP/sonet and will interconnect with both CALREN-2 and Abilene. They are also considering Atlanta, Seattle as additional candidates.

NGIXen is an effort for the LSN/JET committee to define NGI eXchange points likely to include FIX-W, Chicago, DC. No particular interest on the part of ESnet to provide additional inter-agency interconnect points. Since ESnet already at Chicago and FIX-W it may just be a relabelling exercise.

For SC98 Sprint will put in an OC3, and ESnet will provide IP and native ATM support. LBL. will have a computer science booth there.

ESnet is looking at supporting SSP requirements (Tbps in 2002, i.e. 10Gbps fiber with 100 wavelengths gives 1 Tbps.). Have had conversations with Qwest who are interested in organizations with high bandwidth requirements. Put together a rough proposal to Albuquerque. Ten years ago we were at 56kbps on the backbone, so Tbps in next 10 years may not be a stretch.

DOENG is taking over the DOEEM network. They have a series of workshops (ENISIM). Have had 3 meetings. The people attending the meetings are strong on requirements and process, but not highly technical. There has been an agreement on non-encroachment policy to avoid routing problems. The management model may be different form ESnet, e.g. each site may manage its own DOECN router, and the central NOC may only be available 9a-5p Washington time.

International issues:

Frustration getting good international access in particular to the university crowd. ESnet has been talking to NSF to see if can improve the peering at Perryman. But there are AUP problems. NSF proposed to peer in exchange for ESnet partial funding of interconnect between Perryman and STAR-TAP. DFN is not interested, and there are some thorny routing issues. Given that vBNS (the NSF backbone) will disappear in 2000, and the political issues Jim recommends waiting and trying to get better connectivity with Abilene, who are too busy to think of issues like this at the moment.

DFN is looking to increase the Trans-Atlantic bandwidth to at least OC3. But will not help due to peering issues. They will maintain a presence at Perryman as long as ESnet is there. There is priority traffic at 1Mbps from the current 90Mbps DFN link which is working well. There is a good deal more traffic going to Germany than coming from Germany.

The traffic to Italy has increased in the last year. they plan to move from a T1 to an E1, and move it to Perryman. There are longer term plans to put a T3 to the US and want connections to STAR-TAP or Perryman sometime late '98 or early '99. STAR-TAP will add more delay but probably not a big issue.

Plan to upgrade the T1 from KEK to LBL to a 10Mbps ATM connection to Chicago, which is expected to be operational in Oct '98 (i.e. any day now). The ITER program pays for a connection to JAERI but funding is questionable and there are no upgrade plans.

CERN has 2*E1which look OK at the moment. They are looking at an upgrade. DANTE has connectivity to Perryman which looks good. Have opened discussions with JANET people. They currently come into Canada. JANET will extend to US in NY in 1Q99 will consider establishing a connection for peering, and will also look at QoS (similar to DFN).

Research & Advanced Technology

They are working on DiffServ testing backbone technologies including traffic shaping, queue management, policing, classifiers, marking etc. they are looking at an architecture for an ESnet DiffServ approach and have deployed the Clipper project testbed facilities which they plan to test over OC12 soon. For OC12 they are now passing 373 Mbps of IP data with about 0.002% loss between ANL & LBNL. They are also testing the shaping capability of GSR using an HP analyzer to verify GCRA compliance.

They are looking at throughput performance issues including a systematic study of the interaction between TCP fall-back router shaping, IP & ATM queue management, ATM policing and other variables.

They are also looking at an automated help desk to provide first line defense for help via a Web based interactive help desk.

The Cisco Web cache is being evaluated but they are waiting for much needed new features and bug fixes, also it is fairly limited in features. It is a transparent webcache vs. a proxy cache (e.g. squid and harvest). Some people consider transparent caching as "evil" especially if the cache misbehaves it can be frustrating. the recommendation is that ESnet NOT deploy since the potential efficiencies on bandwidth utilization do not offset the problems introduced.

Other research activities are PIM deployment, and IPv6.

They are looking at putting together a 2-3 member research group to focus on applied research and advanced technology and will work with the (late lamented) LBL network research group.

Sprint Contract Extension

They have signed a 2 yr. contract extension with Sprint to extend to August 2001. The contract includes ATM port charge reductions for FY99 and additional reduction for FY00 and FY01. They are investigating means of reducing local loop charges. They are now developing plans for starting follow-on contract in parallel with a goal of 0.1-3 Tbps capability in 2003.

Miscellaneous

ESnet was identified as mission critical. ESnet has successfully passed DOE Y2K audit. They have demonstrated compliance including vendor statement, router non-reliance on year-date for operation, completed testing for CY2000 roll-over and leap-year oddities, and they have a contingency plan and procedure to invoke if needed. ESnet scored as "perfect" w.r.t. Y2K compliance. The test plan is on the Web.

Allen Sturtevant and Rick Schnetz have returned to LLNL. Tony Genovese (replaces Alan as head of the group) has returned from Microsoft and Dan Peterson has joined from UCB.

ESnet are running mod_perl on the Apache server. They saw quite a lot of improved performance. It also has a database interface module with a Perl interface to Oracle supporting persistent connections. Th contact person at ESnet is vui@es.net.

I met with Bill Lidinsky, Gary Roediger and Eric He (he@es.net 630-840-5216) about PingER futures. We came up with a plan to explore further migration away from SAS. The first step is to modify the HEPNRC front end data gatherer to not only record the data into SAS, but also to make a a copy of what it has gathered in an ASCII flat file. This will also involve a port from Unix to WNT. Eric He will look at Dave Martin's code to judge the effort required and then implement. SLAC (Warren) can then copy the data from the flat ASCII file, and delete the file when copied. This removes the need for SLAC to access the remote sites to gather data.

The next step will be for HEPNRC to make a static copy of part of the SAS database in an SQL database for testing/development purposes. This will be done on a fast (dual processor 300 Mbps) WNT server and will probably use an Microsoft Access database. Then HEPNRC is interested in developing a Java servlet to run on the server, which will use the JDBC/ODBC Java classes to access the data, and prepare graphs on demand from a web form running on a browser. The idea is that if successful this may replace the current SAS way of generating these graphs. In that case we will also need to migrate the data from SAS to Microsoft Access database. If the performance of Access is inadequate, then since we have a standard API (JDBC/ODBC) it should not be hard to changeover to another SQL database, or maybe even remain with SAS.

AS far as the servlet goes, we (SLAC) should see whether this might be based on Tony Johnson's Java Analysis Studio (JAS). One of the issues would be whether JAS can serve up gif images.

TCP Performance Considerations - Rebecca Nitzan

WFQ (Weighted Fair Queuing) gives preference. CAR (Committed Access Rates) gives rate limits. DWRED (Distributed Weighted Random Early Detection) helps competing TCPs under congestion.

She set up a lab test with an alpha talking to an alpha via Cisco's into an ATM cloud. They saw large numbers of packet loss in bad cases. She used tcplook to report on TCP progress by using tcpdump output. It shows the data and acks with the sequence number plotted against time.

CAR uses a token bucket queuing scheme (i.e packets get tokens from a bucket before being put on the network) If there are not enough tokens (or the TCP window size is too large) then packets get dropped before being put on the network. This causes TCP to drops back into slow start and get oscillations as it rebuilds up the window size and then over-runs the tokens available again. Why doesn't TCP back off to a steady state, since with other rate-limiting parameters TCP backs off to a steady state. The answer is there was not enough buffer space for TCP's window size between the sender & receiver. ATM policing uses a leaky bucket variation. The sending TCP slows down waiting for an ACK from the receiver, and will wait once an entire window's worth is in flight.

For DWRED the router drops packets based on precedence as queues fill during the load. The good needs is that random TCP senders back off, which disrupts simultaneous TCP oscillations. The bad news is that UDP's send rate does no back off. Luckily most of the traffic today is TCP, however more and more non TCP traffic (e.g. voice & video) is is being put on the network.

The summary is to be careful configuring queues in the routers for "improved performance". Consider the traffic mix: is it TCP or UDP. Then watch the TCP performance to see if it working as one wanted/expected.

Network Group Update - Mike Collins

IPv6 Update - Bob Fink

We are now about 4 years into the Ipv6 project. There have been a lot (> 50) implementations, some now in production. Until Microsoft delivers it will not happen. Microsoft says they are committed to implementing IPv6, it is critical to the continued growth of the Internet which is critical to Microsoft's survival. They are 3-4 years from it being into production use. So we are in the "dog days". The Research & Education folks need to carry the message that IPv6 works by getting it into our networks and motivating some user applications to run over it. ESnet is carrying the message to I2, CA*net2/3 CAIRN and the European R&E nets. The current MS version of IPv6 is an experimental version which does not implement Microsoft networking (file access etc.) on top of it etc. Cisco will not implement until release stream 12.0 (will come after MBGP), i.e. release in a year from now.

The existing IPv4 registries will be issuing IPv6 addresses in January 1999. There is a router renumbering protocol to allow one to change out your old IPv6 addresses for the new one.

ENWG - Phil DeMar

Discussion on who we are what we should do, what is the charter, is there a consensus. Possible roles are to coordinate & participate in inter-site projects and test beds, to share information on new technology roll-outs. What is the balance between presentation & discussion, what is the boundary between the ENWG and the ESCC?

There was a site gigabit network status survey. Plans are mostly in testing & evaluation stage right now. A number of sites anticipate deployment within the next 12 months. It will be initially at the core, then at edge. The deployment on the site boundary routers seems unclear in timing & implementation. There are many concerns about upgrading 75xx class routers with Gbit interfaces.

Layer 3 switches are in test & evaluation at ORNL. there is little/no performance degradation, the vendors performances were quite similar and the impact of filtering needs further study.

SLAC is looking at xDSL as a higher performance cheaper alternative to ISDN.

There were 2 sessions on security. FNAL & LLNL are using boundary router (Cisco) Netflow for logging off-site network access. there is concern about the next generation routers being able to provide such information. The "bro" system from LBNL is able to detect intrusions. It has a flexible programmable mode and is freely available.

Network monitoring status covered the techniques and mechanisms and directions they are going in. Comparisons with Surveyor look good and are complementary.

Some participants felt it was beneficial similar to the ESCC meetings of yesteryear. The presentations were good but there was not enough time for productive open discussions. The track level of between meeting activities is well established: it is low priority in over-subscribed work environments, so the ESCC meetings are the likeliest spots to do something useful. WE need to do what needs to be done when it needs to be done, but not before. Maybe we need a workshop like approach to covering one or two issues of common concern in some depth. Possible topics include revisiting the gigabit implementation plans in depth and dial-in access/telecommuting, especially VPNs. We will probably need more time, a tighter focus and better preparation. The bottom line is that the meeting has the potential to be useful but needs some refining.

The Mercury Project - LBNL

http://www.lbl.gov/~dch/

This was based on the Mercury Project report which is available on the Web at the above address. It is a single email system for multiple systems (Unix, NT ...) It is primarily a migration (from cc Mail, QuickMail (no longer supported by vendor) & VMS) to IMPA4 and encompasses centralized directory servers and security. Email growing at 60% per year. The criteria included client, server, management, security. The client had to be multi-platform, with an integrated desktop, MIME standard enclosures, mobile computing support (mailboxes/address books everywhere, people can logon, download messages, logoff, read messages, compose messages, logon and send messages), centralized administration and a Web client. The server had to be open system / standards based (can replace when vendor no longer supports), server centric (mail to be stored in a central location), scalable beyond 5 years Lab growth, centralized directory services with a single sign on, bandwidth becomes more critical with enclosures, it must have high availability & reliability and backup & disaster recovery. They want secure connections, anti-virus scanning and cleaning, anti-SPAM, and digital certificates. They also want a lower Total Cost of Ownership (TCO).

The solution uses IMAP4 with Netscape Messaging Server 3.6 (total software cost ~$200K for site license for clients and server etc.) on Solaris 2.6 on a Sun Enterprise 3000, 4GB RAM, 4*336MHz processors, a 300 GB RAID array, Veritas file system / volume manager with Legato for single message store backups. They run a separate POP serve. They have 2 IMAP machines (one backing up the other) acting as messaging servers with automatic switch-over (HA configuration) and common access to the data store (300GB hardware RAID divided into 4). They have 100 Mbps Ethernet and will be moving to 1Gbps Ethernet. They wrote their own server monitoring to get cpu usage, load average, disk usage, they look at login time, time to check a new email, and download a 1M and 8MB enclosure. They also track number of messages sent, received, bytes sent etc. For sending they use Sendmail 8.9 running with 2 different configurations listening on 2 different ports. They scan for viruses.  For UBE they use the Realtime blackhole list, they do reverse name lookup.  The viruses discovered per month range from 40-90 and are mainly Word based. They only support Netscape Communicator 4.05 on Mac 68k/PPC, Windows 95/98/NT, Solaris / IRIX / HPUX / OSF / Linux. They have custom configurations. They also support Pine & Microsoft Outlook Express.

User migration went well. There is a web page signup form to request an account. Installation was free, but charged for migration of Email. Only 36% converted their old email to IMAP. Converted about 400 users/month. They now have 2000 users. They are poised to turn off QuickMail & cc Mail. Most useful resource for users converting was Web pages, followed by word of mouth. They also had training sessions, reference cards, handouts/flyers, and brown bag lunches. Macintosh users were less happy with the conversion than PC users (expect because older lower speed 68K based Macs do not perform well). From surveys users are very happy with the migration and the new system. Now they are doing a calendaring system, looking at FAX routing over IP, PBX voice mail integration, voice mail enclosures.  One and a half FTEs maintain the complete system and the total system/conversion cost was about $2M.

QoS

Axioms: 3 types of people optimists say no need for QoS simply add bandwidth, hedgers say better have a plan just in case not enough bandwidth, pessimists say for E2E you need QoS. Network managers will not trust end system users. The most important place for QoS is on International links where it is hardest to deploy (settlements, politics, etc.)

If one looks at the various congestion domains: on the subnet simply over-provision plus class based queuing (most router/switches provide multi-queue capability). Wide-area might have quotas, feedback via pricing. CBQ works since get an inflection point for low-priority traffic at 30-80% loading.

Design issues include the cost of the resource versus the cost of the controls (control cost must include Policy jitter (i.e the policy changes with time as one learns more, or different requirements are added, or different administrations/goal are put in place!). Flow setup overhead vs. flow length. Statistical vs. guaranteed quality. Prioritization via privilege, desire or need (e.g. is it done by application, but application might be port agile, the need of a freshman running video conferencing may not be the same as the university president using the same application). So one maps into price, but is the price based on usage or quota. Privilege associated with user or port.

The congestion avoidance tools can be divided into 2. The first has end-system cooperation with adaptive applications, adaptive protocols and admission control. The second does not require the end-system to cooperate and includes traffic shaping (to deuce burstiness), policing (requires a contract which is then policed), eligibility ports (configure some ports to get premium treatment), behavior shaping (user adaptation, e.g. by pricing).

Does one really want reservations? Event duration is needed in order to schedule, big-chunk reservations require sequestered bandwidth, small-chunk reservations are unnecessary, applications may need problematic bi-directional reservations (through potentially multiple administration domains), reservations invite policy complexity and expect marketplace rather than reservations to dominate. It is unclear how RSVP will evolve in such an environment especially given its requirements for state across the network.

The IETF Diff-Serv approach results from doubts about IntServ/RSVP scalability. the concept is to abandon end to end per flow reservation with packets being marked at the edge of WAN as to equivalent class. The current debate is the semantics for TS/DS bits. The prognosis is favorable.

The internet 2 QoS WG is focussed on the DiffServ approach with a bandwidth broker at the "edge", with aggregation of flows into classes and no per-flow reservations within the core. Some WG members think Diff_serv is to simplistic/optimistic, others think the exact opposite.

The commercial internet probably won't be best effort for long. They want to charge for premium services. A reservation model is unlikely. Some form of quota/CAR approach seems probable. The pricing model is unclear. The need for recharge is likely (unlikely that can provide a premium service to a select few and charge to infrastructure.

Interesting issues include: what to do with over-quota packets - could drop rather than downgrade since likely to arrive out of order and be dropped anyway; incoming traffic may cause the biggest part of NSP charges; how do subscribers know they get what they paid for, what tools will the network manager have to show that user is getting what paid for?

Current campus networks are not ready for QoS will need fork-lift upgrades, get rid of shared hubs. There will be widespread 802.1q support expected - but vendors assume end-user will set priority which will not sit well with network managers and will need authentication which most equipment does not support yet; switching & full-duplex needed; access by a single person from alternate locations implies multiple authentication methods.

There are no guarantees. there is a choice of a busy or degraded service which are very different strategies. there is still no adequate bandwidth. If you need absolute predictability then don't share.

Conclusions: future peak/aggregate usage patterns are unknown and no one can say how much BW & QoS capability will really be needed. In the enterprise can provide adequate QoS without per-flow lookups or reservations appear plausible. The Fast/Gbps Ethernet infrastructure can reduce odds of congestion, with multiple queues and CAR policing providing additional headroom. But even the minimalist CoS & DS have worrisome operational implications. Need to reduce policy jitter as a network design goal. This requires a solution with a very small set of policy choices, otherwise policy management will eat you alive.

CIAC "State of the Threat" Update - Jerry Rayome

I have a copy of the presentation.

Number of sites being scanned is increasing. 59% of reported incidents are scans. Scans lead to break-ins. Normal modes of intrusion, unpatched systems, clear text passwords, poorly configured firewalls or border routers. There are many sophisticated tools are out there, many are quite difficult to use. It is far easier to use existing problems since people do not patch (e.g. IMAP, RPC.statd, sendmail, echo/discard flooding, IP redirect). Web home pages have been hacked (e.g. NY Times). There is a buffer overflow phf stack, and most recently a lot of SGI attacks to GET cgi-bin/aglimpse ... cat%20/etc/password etc. There are still a lot of SMURF denial of service attacks. Incidents/month are going up, now around about 200, a year ago it was about 40. For DOS intrusions sniffer attacks are about 17%, buffer overflow & egg-drop are about 8% each,  tear-drop about 4%, smurf about 4%.  For incidents the main one (~800) is scans/probes/door rattling followed by attempted intrusions (~350), followed by intrusions (123). 1998 attempt leaders are Phf (28%), Telnet (11%), smurf (10%), and IMAP (6%). The old attacks still work so why bother with new attacks when old ones work. Current leaders are phf, linux exploits (imap exploit still works), denial of service (smurf, half-open connections), sniffers, DNS, problems with email, hoaxes, spamming), software piracy (misconfigured servers), back orifice (Windows 95/98), IRC, MSCAN, Cisco vulnerabilities (no claims to have seen exploits). 

The scan response from CIAC is to email the network admin, data base the incident, analyze the source, target & frequency against existing data, notify other sites of new/significant scans, confirm with sites about vulnerabilities and the significance of recent scans.

Network scans can be categorized into: obvious & noisy (all well known ports, all hosts on subnet); less obvious and stealthy (time delays in seconds, made to look like noise, looking for particular vulnerability); very stealthy (get underneath radar thresholds of some network intrusion detection systems, i.e. the hackers know a lot about the detection systems), looking for who knows what with time delays in weeks, many, many targets, source address rotates with organized hacking (i.e. groups of hackers collaboration) designed to look like network noise.

When the scanning stops maybe they have got what they want from your site; maybe it's too hard to get in (site adequately configured/patched); maybe ISP or law enforcement has shut them down.

Need to work together, report intrusions (with secure communications) work with law enforcement there is a National Infrastructure Protection Center (NIPC), work with vendors (they had success with addressing the new tear drop attack), work with law enforcement.

Proactive defense requires training etc., proper installation of firewalls and monitor them, monitor network traffic, segment internal networks, separate web servers, email servers, make single purpose servers, use secure connections (ssh & tunneling for authentication & encryption), monitor sensitive systems first, if you can't patch them all, patch external vulnerabilities first, use one-time passwords or choose good passwords, use anti-virus software

Requirements for Strong Authentication for Internet Access to Lab Computers - Matt Crawford

How do hackers get in: 70% use stolen passwords, 20% exploit vulnerability found in a a scan. Passwords are stolen by "sniffers" installed on a subnet at some cracked site and then exchanged by IRC. So need strong authentication for all offsite Internet access to Lab computers. This stops hackers from stealing passwords as valid users login from off-site. Can do with one time, encrypted passwords, Kerberos, DCE, PKI, secureID card.

The SLCCC technical working group has been tasked with recommending how to do strong encryption. Can logon to a server on your LAN and then encrypt over the Internet, but a sniffer on your LAN will capture the password. The carrot approach is to have a single sign-on to a site, also cross-authentication between sites. The stick approach is to make sure the user sees that recovering from intrusions wastes time for staff & users.

http://www.jlab.org/exp_prog/SLCCC/ tells about the SLCCC. Bob Cowles is the SLAC Technical Working Group representative.

Network Information Services Group - Tony Genovese

They want to create a standard documentation set with white papers, functional specifications and deployment plans. Then there will be a review process with peer review and community review. Also they will worry about retirement of service, and work on a retirement plan (e.g. for X.500 & PH).

VCSS 5.3 release, the service manager is Joe Metzger with goals to clean up code, authenticate individual users, restructure the underlying database with a production release in 2/8/99.

The authentication service manager is Michael Helm whose goal is to provide an LDAP based authentication service for web server. The initial service areas will be VCSS and the web server. They are alpha testing with webmaster and in 1Q99 will add mail service and personal certificates. Production is scheduled for 2/1999.

They are still working on an automated help desk for which the service manager is Marcy Kamps. The goal is to provide an intuitive web site for problem resolution. The software vendor has been selected (ServiceSoft Corporation for knowledge database). The training & prototype development will provide limited ESnet community 3/99 with beta release at that time.

Studio One goals are to provide capture, store & stream video & audio programming to clients utilizing IP and native ATM. Sue Smith is the service manager. The hardware & software for storage & streamlining  is installed. Beta is 3/99 with full production is 4Q 1999.

They are looking at retiring X.500 & PH. The usage is very low (PH < 100 queries/day, X.500 < 70 queries/day). The top 4 users are infoseek, lycos, minocw.nl and dial-access.att.net. There will need to be an alternative, maybe pointing to the various sites white page servers.

Email BOF

Attendees from FNAL, MITLNS, LBL & SLAC.

FNAL have been pushed towards an NT IMAP server. They are leaning to Netscape Messenger since it has quotas. Now it is looking like it will be moving into production without a complete evaluation/comparison. All was fine until hit about 110 users on a single server. For clients they use Netscape, Outlook Express and Pine. Pine is mainly for Unix users. They were also instructed to do software based RAID 5 on the servers. They have now left software based RAID 5, now use disk mirroring. They do 50 users/mail box, and have about 600 users registered with 500 using it.They are looking to get to about 2000 users. They are using LDAP which is necessary for that number of users. They don't migrate QuickMail address books. There is a Netscape tool for QuickMail migration, but the LBL person does NOT recommend using it.  There are 2 such tools, both written in Java by a single vendor, one customized for Netscape. They wrote a 500 line Perl script to make the conversion. The date field in a QuickMail message has a problem with Netscape. LBL has a fix. LBL has a web page to migrate your address book to LDAP. It is written in Perl. The LDAP database is synched with the LBNL people directory. Ex-staff are kept around in the HR database for 2 years. In January LBL will have a PeopleSoft to LDAP converter available. Only updates the LDAP for changes. Merge information into database from both LDAP entry and form PeopleSoft. The LBL LDAP server is on a 2 processor Ultra with 500MB and will be upgraded next year. LBL has 2 LDAP servers, one for look up for user interfaces (consumer), the other for server access. This way they limit the impact of long users lookups. LDAP has been a problem at LBL.  There is one LDAPserver for the whole Lab. LBL updates the LDAP database live, FNAL and CERN do updates overnight and hen point to the new copy.

The new Netscape server can have multiple servers which are "multiplexed" to allow them appear as one to the users. This allows load sharing, redundancy etc.

Issues are polling by clients (FNAL recommends 10 minute), quotas (FNAL has 20MB, LBL has 100MB/user with no quotas and if go over the limit then they charge back), limits on attachments. LBL likes the scalability of the Unix servers. LBL allows only 2 hours/month down-time (99% uptime) so lot of attention is paid to availability (such as upgrades).   The calendar reboots requires all users to be disconnected.

LBL uses ssl to allow protected access to the IMAP server from the web. LBL has about 1300 POP users still and 2000 IMAP clients. FNAL has a few hundred POP users.

FNAL has 6 * 4GB drives with software RAID, and 2 *  200MHz Pentiums running NT acting as the IMAP server. The decision to stay with NT (vs. UNIX) will be made early next year. The software RAID became a performance problem when reached 100 users. The software RAID also gave problems when a disk got lost and the reconfigure took many hours to check each and every file. Now gone to mirroring. The initial system with drives & chassis & software was about $20K. Now they split things off (using information from the knowledge base, found Netscape technical support to be very poor). They broke up the directory with 50 people per message store so they can spread the load over multiple disk. The assignment is made at account creation time. They don't mirror the system disk yet, this will be done. LBL loses about 1 drive/month out of about 90 * 9 Gbyte drives. FNAL felt this sounded high. They improved their reliability by paying attention to cooling. Lately FNAL have had some performance problems. FNAL looks at user quotas and sends a warning once a day if the user is within say 20% of the maximum. FNAL was worried about quotas for mail since physicists might use their mail boxes to save physics data (since they pay for physics data disks).

LBL has AIMS (Advanced IMAP Monitoring Services) see above which can help determine the source of poor performance (by running locally versus running over the network). This code is available from LBL (contact Damon Houghland <dchoughland.lbl.gov>). The output statistics are available via http://mercproj.lbl.gov/working/AIMS.html and uses JavaChart. Most of the performance problems come from LDAP (they have to look up how to get to a user (does the user have an account, mailbox etc.) in the LDAP server). Unix clients are much faster than PC & Mac clients (due to relative quality of TCP stacks). LBL has 5 people involved with developing IMAP/LDAP services including servers, clients, postmaster, & calendar. LBL is not responsible for backups for Email (they do not want the responsibility which has legal ramifications), they will do emergency restores but charge $120 (and may take up to a week) unless want in a day in which case charge $250. LBL nor FNAL use the Netscape single message store since it gives problems with restoring, and the benefits of having only one copy of a message to be sent to many users is  not worth the headaches with restores. Viruses are scanned by InterScan Virus scan. If a virus is detected then it is returned to the sender with a message. There is a LDAP Perl module called PerlDAP which is available from Netscape (they hired a guy from Motorola who developed the precursor). LBL ran MailStone (runs on Unix & NT as clients) to do capacity measurements. It is available from Netscape (ask your Netscape representative). It is hard to find by searching the Netscape site.

ESnet Mail Working Group

Les Cottrell presented the results on a survey of Email (see http://www.slac.stanford.edu/grp/scs/net/talk/emtf-oct98/index.htm)

Issues raised by IMAP.

LBL is using black box but not in binary so it is semi black box. It is in DOS format on a Unix server .  LLNL data is saved in standard Unix mail box format on a machine that one cannot logon to, so the information is only available via the POP/IMAP clients so it is also semi-black box. LBL & LLNL do not allow .forward on the central server.