November 12-15, 1999
CERN home page | SLAC home page | IEPM home page
|Traveler||Roger L. A. Cottrell, Assistant Director SLAC Computing Services, SLAC, POB 4349, Stanford University, California 94309|
|Dates of Trip||November 12-15, 1999|
|Purpose of Visit||To attend and give a talk at the International Committee on Future Accelerators (ICFA) Standing Committee on Inter-regional Connectivity (SCIC) meeting at CERN on November 13. To have discussion on security, wide area networking (WAN), email, directory services, Internet voice and networking with peers at CERN.|
The main purpose of this trip was to attend the ICFA/SCIC meeting and present a talk on Internet Quality of Service for National Research and Education Networks. I took detailed notes which became the official notes for the meeting. This document will cover the discussions that I had at CERN on Friday 12th Novemeber and Monday 15th November. The itinerary of the CERN visit was arranged by Olivier Martin to whom I am very grateful.
Jiri is visiting CERN for 1 year from the Czech Republic. They capture 1 in 100 packets on th FDDI ring that the external networks connect to and then analyze the data to provide utilization by bytes, packets, utilization, hosts, pairs etc. They are converting to use Netflow 5 (raw) & 8 (aggregation) data formats. They have written their own programs to use the netflow data. They looked at the CAIDA Cflowd tools but found them too heavy, complex and unstable. Jiri started on the Netwflow analysis about 2 weeks ago.
In another project they are measuring link throughput capabilities. To do this they are using Netperf to inject traffic for 10 seconds every 5 minutes. This enables them to see the current performance and has been extremely valuable in pointing out problems requiring optimization of the TCP stack configurations (window sizes, use of SACK etc.). In the case of CERN to Caltech they improved performance by over an order of magnitude. This is akin to the work being initiated between SLAC (Davide Salomoni) & IN2P3/Lyon (Gilles Farrache) to improve performance for BaBar between those sites. Later Olivier and I discussed how to extend this work between SLAC and CERN. We agreed the 5 minute interval should be extended to say 30 minutes and the duration of each test extended say from 5 seconds to 30 seconds. Olivier is also concerned that that the version of Netperf CERN is using does not allow one to vary the separation between UDP packets so it can swamp the link. Kishan at SLAC has a version of ttcp from Becca that provides this ability, so we may want to move from Netperf to ttcp. Another possibility would be to move to NGEN. Olivier also suggested taht Gilles Farrache may not be interested in making tests until the CERN-IN2P3 link is upgraded from 6Mbps to 34 Mbps in (hopefully) February/March 2000.
Jiri also reads the BGP ASpath information out of the routers and provides a topological map of routers seen by CERN. For the first hop he also has a graphical representation of the utilization of each link shown by the thickness of each link.
CERN has been successfully using the Cisco transparent web cache. They recently tested the Network Appliance transparent web cache. The developer of the University Colorado Harvest project is now with NA and they startetd from the CAIDA cache. One problem is that the NA cache does not refresh when using shift-Reload. Despite the extra cost there was no clear superiority to the NA product compared to Cisco. They will do another test with a Novell transparent cache. CAIDA made a test of various caches and Novell performed well. There is also a problem with transparent caches in general if the cache is inside the firewall it will serve up siteONLY pages to offsite users. Cache hit rates by byte are about 25% and about 40% by object.
The initial bandwidth saving reasons for a cache are less critical as one has adequate off site bandwidth. However, they also serve up pages faster. For example, FNAL pages from the caches are served in 25 msec versus several hundreds of milliseconds otherwise. This speedup is aided by the ability of the cache to centralize HTTP 1.1 use which helps consolidate multiple flows from the many objects composing a page into a single flow.
The CERN offsite link is shared with IN2P3, WHO and is a 20Mbps link. It will be upgraded to 34Mbps in March 2000. The big users are CERN & IN2P3. CERN hosts a CIXP (Commercial Internet eXchange Point) and many providers now have hauled fiber to the CERN CIXP. This may make it easier in future for CERN to upgrade capacity since the fibers are already in place.
Olivier has been looking at TCP performance on WAN links. In particular measuring the sequence numbers versus time of a flow with tcpdump and analyzing with tcptrace and xplot. In general the Internet is doing well for short flows but no so good for long flows. Even with high speed links, typical performance is poor (i.e. throughput is way below what one might expect). This is due to the need for larger (larger than the typical default of 64kbps) windows on long delay links with high bandwidth. Some TCP stacks such as Solaris allow setting of the larger windows (however, typically it need root access), and Solaris has an experimental release of selective acknowledge (SACK) which help reduce resending many un-necessary packets when one packet is lost. Unfortunately SACK is not implemented in Linux. With no tuning Olivier got about 1Mbps between CERN & Caltech, but with tuning he got about 16Mbps on a 20Mbps link. There is a useful web page on tuning from the Pittsburgh Supercomputer folks at http://www.psc.edu/networking/perf_tune.html
A tool that Olivier has found useful is ioload. This uses SNMP to read real-time bytes/sec from the router MIBs and shows in an xload type plot. A second tool that would be useful is an extension of tcptrace for VoIP traffic. It would unscramble the various headers (IP, UDP, RTP) of the VoIP packet to get sequence numbers and identify what is going on.
CERN will be moving the PingER monitoring to a new host with a new name in the near future, so we will need to coordinate to ensure the information is still available.
CERN now has 25,000 UTP-5 plugs, they started the structured wiring in 1995. They have 15K active IP addresses, > 1000 hubs, > 100 routers, 600-900 subnets, and several hundred swirches. They also have 60-90 legacy FDDI rings and 10% of the active IP addresses are still on coaxial cabling. The latter is mainly associated with outlying isolated areas, which cause 30% of the interventions and they want to replace. The average desktop has 10Mbps shared Ethernet access with an average of 18-20 users/hub of which typically 8 are active (these averages do not include computer farms). They are not buying hubs anymore. The LAN architecture is similar to SLAC's with a core of switches connected to switches in "Starpoints" (e.g. building closets with 100-1000 plugs). User devices such as desktops, hubs, and switches are located within 100m of the starpoints and connect to the Starpoint switches. 100Mbps connections are provided at no cost to the user. Higher speed connections cost the user extra. User switches are procured, configured and managed by the computer center but paid for by the user. Hubs are free. They make about 1000 disconnect/connects per months.
They evaluated Gbps products from 14-16 manufacturers and eventually selected Cabletron. Jean Michel quoted Cisco's lack of support for RMON-II, the ability of Cabletron to dynamically change MAC addresses (they use this when a router/switch goes belly up so they can switch a new router into the old address), and the cost of Cisco kit. They will be replacing their existing Cisco equipment.
50% of the desktops on site are running Windows 9x. They will move to W2K. There are about 2000-3000 Unix/Xterm machines. There are also a large number of Macs and about 150 Shiva Fastpaths still deployed with LocalTalk. They only use protocol based VLANs (e.g. an AppleTalk VLAN). Besides Macs running AppleTalk they also have Linux & Sun machines running AppleTalk.
The network folks are concerned about allowing dynamic DHCP since they then have no idea of what is on the network. However, static addressing does not allow the user to insert a NIC card. There are a limited number of "portable" plugs at CERN which are left connected even if there is no use. These are typically for conference room or public area use of laptops. Note plugs are disconnected from switch ports after a period of inactivity in order to provide for efficient use of resources. These portable plugs typically have static DHCP addresses assigned and are so labeled. There are some special subnets where portable plugs with dynamic DHCP addresses are supported. The portable plugs are inside the firewall, they rely on site security to prevent intrusions. The DHCP server is a Digital Unix machine.
They also have strong logging of flows and record the Mac address of machines that connect
The head of the security team at CERN, John Gamble, was unable to be present due to illness. John reports directly to the directorate and leads a team of 3 others (Lionel Cons, Paolo Moroni and Denise Heagerty). Besides the technical team (CERT). there is a security policy committee which createda CERN Security Rules document which will be published in January. Denise did not recall whether the policy committee has met recently
In John's absence Denise ably assisted me. Denise is leading a project to investigate common authentication for use by CERN services in particular Windows 2000 and Linux 2000 projects. The idea is to reduce the number of login/passwords requested from users. The project is called CLASP (for Common Login and Access Rights across Services Plan). The first goal is to simplify use of services, a secondary goal is improved security and elimination of clear text passwords is a desirable outcome. They plan to have assessed the feasibility by March/ April 2000 including a list of services which can be covered by the authentication mechanism and a list of proposed steps and resource estimates. The initial thoughts are towards seeing if they can use Kerberos v5, how it ties into AFS, tieing it into the Mail system passwords (which will saved in LDAP), and seeing how to tie into web access. There is also interest in collaborating with other HEP sites with the goal of simplifying access between sites for authenticated users (cross-realm authentication).
We discussed several other issues including security for the business services people, use of RADIUS/TACACS, account termination policies, DHCP policies, clear-text passwords and ssh, however for security reasons the results of those discussions are only available on a need to know basis.
CERN is running the Apache web server release 1.3.6 with plans to move to release 1.3.9 soon. CGI scripts have to be placed under a specific directory tree. The web administrator provides execution access on a per script basis. Every time a script is added to the server, the script writer in invited to check the script for security exposures by using a CERN written CGI parser that look at the Perl code to look for a list of know possible security exposures (to first order it looks for "system", "exec", "open" and backtick commands "`" in Perl scripts). The list of exposures came from CGI gurus etc. If a new exposure is identified by a guru then the parser will need modifying. They have a student who updates the parser code as needed. They automatically check for new and modified CGI scripts. They have some documentation on exposures to look out for and workarounds. They do not modify CGI code, they expect the user to fix the code. They notify the script user if the code needs fixing, and will renotify the writer after 2 weeks if it is not fixed. The parser has been in test for 2-3 months. It could be ported to other sites (it is about 130 lines of Perl including comments and blank lines). They have no parser for REXX or Python or other CGI script languages. See http://www.cern.ch/WebOffice/Tools/CGIparser.html and http://www.cern.ch/WebOffice/Presentations/Tutorials/CGI_Scripting for more information.
CERN has a major project to provide next generation "groupware" to users. In this context "groupware" means integrated email, mailing lists, directory services, calendaring with roaming support. Roaming support in its simple form in this context means having your preferences, bookmarks, address books etc. on a remote server thus enabling you to be location independent. A key component is the use of Lightweight Directory Access Protocol (LDAP). LDAP provides mail, calendar, web etc. profiles (needed for roaming), as well as the more standard address book functionality. There can be multiple linked directories and several Labs are collaborating on this including IN2P3, DESY, RAL and FNAL. There can be multiple schemas accessing single sets of data. This can be used to provide multiple views of the data, for example people at a Lab may have fairly complete access to its phone book information (or wherever it keeps its Enterprise wide people etc. database such as BINlist at SLAC) whereas people from outside may see a small subset. In addition one schema can be used to extract a common subset of information out of multiple phone books kept at different Labs. Thus one goal is to set a global HEP phone book where each Lab keeps its own phonebook and provides a common HEP schema (the appearance of the schema from outside is identical for all Lab phonebooks) for other Labs to view/search the information. Arnaud demonstrated Netscape's LDAP search tool that provides wild character and "sounds-like" matching, to search the CERN phone book on the web by firstname, last name, department etc. For SLAC to join in this we would need to set up a HEP schema to allow access to the SLAC phone book.
An example of profile use is that LDAP would be the standard access method to get at email profiles, afs profiles (pts information, quota, home directory), mailing list membership and access controls, the future Windows Active Directory (WAD) will be populated via LDAP (at its core WAD is an LDAP application), Arnaud expects to keep or at least access passwords & userids through LDAP. Another major concept that is effectively enabled by LDAP is the idea of groups. This is initially driven by the mailing list application, however it is also a critical component of the WAD, and Arnaud expects to use it to keep CERN organization group information (what groups are in the IT division, who is in wjat group, who is the supervisor, the administrative aide etc.), future sharing of email folders will be enabled through LDAP as will what groups of people have access to certain web pages. The integration of LDAP access to the various data bases of today (WAD, AFS, email, calendar, groups, mail lists etc.) is not something that will be worked towards with scripts being written initially to copy the data to LDAP so it may be read only. Later in some cases LDAP maybe the major repository for some of the data, and it is automatically copied to other places from LDAP.
Arnaud is very pro the new Sun/Netscape Alliance future offerings. He feels it will have the required integration of the various tools, and be scalable for large email providers such as ISPs. This latter has a nice spinoff that it may leave open a future path where one out-sources a sites email to an ISP since they could provide similar functionality with an identical front end. Arnaud said it would be very useful to ask our Sun sales folks for a demonstration of "iplanet" to show some of the ideas the Sun/Netscape Alliance have in mind for LDAP and groupware. The Alliance have a roadmap that will merge their (initially) products in this area. The new products (calendar, email) are still in development and not available yet, so if one has to move quickly then they may not be ready in time. The Sun mail server (SIMS) project included Innosoft's PMDF
Arnaud is not too hopeful about the Microsoft Exchange product. It works well for small groups but does not scale well. It was architected based on X.400/X.500 just before Gates said let's go Internet. A redesign to move away from X.400/X.500 is a major problem. Exchange needs heavy support just to get started, one needs to check how it handles issues such as spam and mail-routing. There have been consistent reports of corruption of Inboxes at peak usage times by several sites, reports of registry corruption, and exchange servers appear to prone to denial of service attacks (I think by viruses). Gartner says that big email servers (e.g. ISPs such as Yahoo or Microsoft's own Hotmail, - despite heroic efforts by Microsoft to move to their own product) do not deploy Microsoft Exchange to provide the services rather relying on Unix based products. David Foster is also disappointed in the Microsoft calendar.
The existing VoIP pilot between CERN, DESY, SLAC, FNAL & ESnet has been based on Cisco 3640 routers. It has been successful in showing that the technology can work well in some cases. CERN wanted to upgrade to a more recent release of the Cisco code (12.0.5 T3) but this caused problems for the others sites and FNAL had problems when they upgraded so they had to back it out and CERN followed suit. CERN tried using NetMeeting with the project (it says it supports the G.729 protocol that Cisco support) but it did not work. Another problem was the variable number of digits in DESY numbers. VoIP calls to DESY also often have poor quality. Also some people at CERN have got new digital phones (maybe an ISDN interface) which don't always seem to hang up a VoIP call properly.
CERN wants to start a pilot to evaluate Ethernet phones. This will be used as input to how to provide phone capability for the new LHC building that won't start before the end of next year. There will be some new offering of Ethernet phones early next year and they expect to pay a few hundred dollars per phone set. Cisco is selling its (ex Selsius) Ethernet phone system in Switzerland. A call manager plus 4 phones costs $24K list today. There are questions on how they do voice mail, how to do call forwarding, conference calls etc., as well as security, and how modems work.
|Jiri Navratil||Monitoring, traffic statistics|
|Jean Michel Jouanigot||CERN LAN plans|
|Arnaud Taddei||Email, mailing lists, LDAP|
|Rainer Toebikke||IP telephony, remote access, QoS|
|Oilvier Martin||WAN update|
|Denis Heagerty||Security & authentication services|
|Monica Matinucci||Web CGI scripts|
|Tuesday, November 9||Leave Menlo Park|
|Wednesday, November 10||Arrive Paris|
|Friday, November 12||Fly Paris - Geneva, Discussions with CERN folks|
|Saturday, November 13||ICFA/SCIC meeting|
|Monday, November 15||Discussions with CERN folks|
|Tuesday, November 16||Fly Geneva - San Francisco, arrive Menlo Park|
[ Feedback ]