Current State of the Internet and Network Monitoring

In 1994 a subgroup of the ESCC, the Network Monitoring Task Force (NMTF), was chartered to investigate tools and techniques for monitoring traffic, reliability, and consistency of connections between ESnet sites and sites of interest on the greater Internet. In 1996, the group coalesced around work begun at the Stanford Linear Accelerator Center (SLAC) to monitor the connectivity between SLAC and the world-wide community of researchers using its facilities. The goals are to monitor end-to-end connections in order to: (1) determine trends, (2) help set user expectations, and (3) trouble-shoot connection problems between local ESnet sites and remote networks in North America and the world.

A small focal group consisting of networkers at the Brookhaven National Laboratory (BNL), the HEP Network Resource Center (HEPNRC), Oak Ridge national Laboratory (ORNL) and SLAC agreed to work together to:

  • come up with common recommendations for baseline monitoring for focal sites;
  • document requirements for hosts being monitored;
  • document and share tools for collection, filtering, analysis and report generation;
  • share information with the ESnet community and at appropriate venues.
Currently four sites (University of Maryland, HEPNRC, ORNL and SLAC) monitor over 200 remote sites in over 12 countries on 5 continents. Remote sites were chosen by polling the research community and/or sampling network accounting data. The measurements are based on the almost universal availability of the Internet ping utility to measure end-to-end performance. Using this we can obtain short and long-term measures of bottleneck bandwidth, available bandwidth, response time, packet loss, reachability, and predictability which can be related to user applications. A major challenge has been coming up with simple, intelligible ways to characterize and visualize the enormous amounts of data.

The results indicate that by most measures, performance within ESnet is excellent to good. However packet loss performance between ESnet and the Internet at large is, on average, poor or worse for the hosts monitored. Packet loss seen from SLAC for non-ESnet hosts improved dramatically between April and June 1996, and the improvement has been sustained. In general performance is very variable in both the short and long-term, particularly for international hosts. From SLAC, average monthly response times by host groups are typically 300-500 ms. for international hosts, 150-220 ms. for eastern N. American hosts, 80-140 ms. for western N. American hosts, and 40-50 ms. for ESnet hosts.

The methodology is also being utilized to: select Internet Service Providers (ISPs) and monitor their performance possibly with a view to writing a service contract; help decide which universities to connect directly to ESnet; and, to identify bottlenecks in order to decide where to focus efforts/work-arounds.

Considerable interest and requests for information has been obtained from ESnet related groups such as the International Committee on Future Accelerators (ICFA), as well as outside the ESnet community. Presentations have been made at various federal network meetings (EOWG, CCIRN), and 3 papers submitted for publication in the last 6 months.