ISMA Meeting Trip Report San Diego, May-97

Les Cottrell, SLAC, last update May 2, 1997

Introduction

This was an invitation only meeting for people involved (mainly developers and heavy users) of Internet Statistics and Metrics Analysis (ISMA). There were about 65 attendees from ISPs, NSPs, Commercial users of the Internet, and researchers. More people were turned away than were accepted. It was held at the San Diego Supercomputer Center (SDSC) on the UCSD campus. There were 3 goals: foster awareness of the tools and measurements etc; share information get to some common ground and foster realtionships between vendors & ISP & users; get input to how CAIDA should evolve (it was recently funded for 3 years, at which time it will become commercially funded or no longer exist). There will be a report of the meeting so I will only report on things I found of interest.

Backbone

ANS actually gives money back if the performance (packet loss) is gretaer than some amount. Need AS to AS traffic matrices. The NSPs gather SNMP data. Some are working with Netflow, for example to get AS matrices. The public tools do not seem to be heavily utilized. They would like to encourage public tools. The only commercial tool used by ANS is SAS which is not a measurement tool. Sprint said they have a 50:50 split public for in-house developed vs commercial. MCI is all internal. UUnet is mainly using home grown tools. BBN is about 50:50, and some of the commercial was developed and commercialized by BBN. Some people said there was some very useful information available by looking at the NTP statistics (some damping or looping measures were mentioned).

Measurement Infrastructure

NIMI Jamshid Mahdevi - PSC

NIMI (National Internet Measurement Instrument) = industrial strength network probe daemon (npd a la Vern Paxson). Probe host scattered about network. Probe to probe measurements and end to probe measurements. Extend to end to end. Need more accurate clocks, packet filtering techniques to precisely timestamp packets on the wire. Eventually deploy on every campus/subnet, readily available, easy to deploy and configure etc. Major challenge is scaling to a large number of probes (e.g. 1M NIMIds with full meshing would be O(n^2) would fast overtake any number of trench diggers). Must optimize low bandwidth, high yield measurements. Want to have a meaurement archive. Another problem is to discover the topology of the NIMI mesh and the underlying Internet, one needs this for path decomposition, statistical decomposition of end-to-end tests. Data must be distributed, must deal with permissions & data scope, also use of caching of queries & results. NIMI is not a large scale hardware deployment, it is not centrally managed, it is not the IETF/IPPM, and it is not for profit.

Surveyor Infrastructure - Guy Almes (ANS)

Collaboration between ANS and 23 common solutions members (universities). Uses GPS (times good to < 1 us) interface from Truetime (~$3K) in a PC running FreeBSD, and a Web based results server. Users interrogate server using a browser, measurement machines upload results to web database server. Tests one way delays. Three surveyor machines ship in a week (Penn State, UMich, GWU). Antenna placement requires an installation lead time, e.g. excessive antenna cable lengths. Other problems run into include SSH portability problems with SGI, ATM not yet supported (expect summer 1997). Want to introduce the measurement machines at/near the right exchange points to reduce the scaling problems. With 25 sites & no exchanges there are 625 sites, with 5 second average randomized sample times, this gives 725 Gbytes/year.

Mike O'Dell of UUnet is doing something with ATM one way measurements.

IDD System - Mitchell S. Baltuch (Unidata)

See: http://www.unidata.ucar.edu/

CAIDA Tools Taxonomy - Kimberly Claffy (NLANR)

Highlights tools, activities, not a tool basher. See:

http://www.nlanr.net/Caidants/meastools.html.

Internet Analysis, Simulation, Modeling, Visualization: Challenges & Current State of the Art

Netsys - Karen Sage (Cisco/Netsys)

Have to understand the limitations of the tools, and realize the benefits. Can get for free a topology visualization of the network in terms of the BGP/IGRPinformation by looking at tables.

Virtual InterNet Testbed (VINT) - Debbie Estrin (USC/ISI)

See http://netweb.usc.edu/vint

UCB, LBNL, Xerox & USC/ISI collaboration. Built on LBNL/UCB ns tools. Idea is to extend measurement into more interacting protocols. Today the focus is narrow, e.g. simulation focuses on a single protocol, there is a lot of duplication. Want to extend beyond existing testbeds. The idea is to build tools that extend the existing simulation methods.

Planet Multicast - K. Claffy

Trying to visualize multicast tuneels etc. One tool is a traceroute tool that displays the latitude/longitude of the nodes along the path.

IPMA Routing/Network Statistics- Craig Labovitz (Merit)

Framework for collecting data. A joint Merit UMich project. Looking at routing instabilities, topology, accuracy/problems; latency, loss long/short term. They have lots of data, probe machines, so how does one use it all. They are looking at the UARC project for data colection/disemmination. There is a "Salamander" Web data server which gathers information from the probe machines. They do data replication. People who provide the data can provide key or scope (community). IPMA have probe daemons at major sites/exchanges, internal to ISP backbone. They BGP peer with ISP routers for stability, topology, availability, accuracy. They do not use GPS for timing, due to need for antennas. Can get good time measures as long as round trip is symmetric (e.g. via a dial up POTS line).

Routing Instability: most BGP info is pathological; many more route flaps (millions, i.e. a couple of orders of magnitude) than expected (40K routes say 2 flaps/day on average would give 100K). The raw data is available for public access. It is basically BGP data. There are Gbytes of it. For more see:

http://www.merit.net/ipma/analysis

Pathchar & Tcpanaly - Vern Paxson (LBNL)

For source and transparencies of pathchar see:

ftp://ftp.ee.lbl.gov/pathchar/

Open Panel on Performance Tools Panel

Flow Characterization Panel

Acccounting - Nevil Brownlee (University of Aukland)

See http://www.auckland.ac.nz/net/Internet/rtfm

In NZ they have to pay for their own use, so they had to get accounting going early on. Billing is one of the things one gets out of accounting. Network accounting relies on measuring traffic flows. Flows relate naturally to user tasks. Can select flow by application. Architecture consists of meters, meter readers, and managers.

Issues include:

granularity, i.e. adjacent address to/from router, peer address (IP address), transport address (protocol type, port), any combination of the above;
accuracy & reliability - how much is needed, must we count every packet, do we run redundant meters and/or collectors, how does the meter cope with unusual conditions, what happens to meter if readers and/or managers fail;
data reduction is done front-end (at meter) by selecting flows of interest, use bi-directional flows, compute derived information, building distributions, e.g. packet size, inter-arrival time;
data reduction - must collect data regularly, summarize into daily/weekly/monthly files, automate using scripts and cron jobs, need standard data format (RTFM meter MIB - RFC 2064 is the NeTraMet flow data file format).

They meter internal links at sites, and international links at the NZ Internet Exchange (NZIX). Use third quartile day measures to get round burstiness and encourage spreading the load. They tried higher quartiles but did not seem to make a big difference. The higher quartiles are more permissive and do not encourage spreading of the load as well.

OC3MON

OC12 NICs about $7K each from Applied Telecom of Chicago for PCI bus. They use a 90%/10% splitter to get at the fiber signal. Two prototypes have been manufactured. Putting both NICs in one PCI bus is a stretch (for bandwidth) but can be done. Then there is the issue of CPU power. First install June 1997.

For more on OC3MON see the NLANR site. The alternative is to use the router stats which may require waiting for new release or bigger faster router, and require a fast router interfaces to collect the data from. Today the OC3MON is a build your own job with a parts cost of about $6K including the PC (200MHz Pentium), 2 ATM NICs, splitters etc.

SLAC/HEPNRC ESnet Monitoring - Les Cottrell (SLAC)

See:

http://www.slac.stanford.edu/grp/scs/net/talk/isma-sdo-may97/index.htm

InterVu - John Leong (Inverse Technologies)

John Leong of Inverse Technologies reported on an interesting dial up ISP benchmarking tool to measure the performance of dial in ISPs (e.g. Compuserve, AOL) in terms of how fast their links actually were running at. Basically the tool is a Windows 95 PC with a USR modem . John said developing it for a W95 platform was painful, but they wanted to reasonably replicate the typical environment seen by many users. This tool is placed at strategic network points and dials up ISPs such as AOL, Compuserve etc. It notes the number of successes/failures (no answer, busy, unable to logon etc.) together with the modem speed achieved (if successful). He also had measures on DNS fails, retries, response time etc.

Timeit - John Sedayao (Intel)

Intel have a tool called Timeit that does HTTP GETs of pages from landmark Intel sites. It times the response. They separate out the DNS response. They report on DNS lookup, HTTP GET connect time, delivery rate and errors. They are looking to generate alerts by looking at the outliers. They are also using it in Service Level Agreements with their ISPs.

Miscellaneous Discussions

ANS put machines on major customer sites, that monitor other machines for the same customer at another site (but on the same ISP) and use these measurements as measuring whether the customer goals are met. These goals are in terms of availability, response time, loss. If the service quality goals are not met then the customer gets money back on the contract. UUnet also do something similar.

There was an interesting suggestion by John Hanley of Yahoo suggesting adding a wrapper around traceroute to find the AS's of the nodes on a route, plus the latittude/longitude of the node (by starting from the DNS entry) and then give you the address of a reverse traceroute server nearby.

Considerable concerns were raised about the non-scalability of active measurements, especially if every web user start doing such measurements (e.g. with NetMedic). However, people felt the active end-to-end measurements being pursued by SLAC/HEPNRC and the HEP community and the similar efforts at Intel are being done in a responsible fashion.

Intel blocks NetMedic (http://www.vitalsigns.com/) at the firewall. NetMedic is a network monitoring tool that could cause lots of network traffic if not judiciously used.