Rough Notes of XIWT/IPWT Meeting at Intel 11/13/97

Les Cottrell

Deployment Status

They hope to have 10 sites running the PingER tools soon. At the moment Bell South (2 site running), SBC (in process soon), Digital (2 in process soon), Houston Associates (1 soon), US West (1 site running), West Pub (running), CyberCash (to come), TC Labs (to come), HP (running), NIST (2 running), CNRI (1 soon). This comes to 15 collection hosts. We need at least 15-20 sites (rule of thumb) to allow aggregation of non-Gaussian data, thus one needs 15-20 site for each aggregation (e.g. for US North-west region would need 15-20 sites).

It was agreed to do full mesh pinging between collection sites. There was a discussion of the value of pinging the root name servers. The choice of sites to ping is predicated on how one wishes to use the data.

Goals of Measurement Project

There was a discussion of whether we should be measuring important application (e.g. Web, Internet phone, real audio, video) performance. This appears to be a separate goal, which would probably need different tools (e.g. HTTP GET) and problems might be addressed to different people (e.g. address ping performance to ISPs, address Web performance to the WebMasters at the remote sites). The end point one wants to get to is to understand customer perceptions of application performance, however one needs to start reasonably simply so the initial focus is on the network performance via the

Data Needed to Address Goals:

Measurements Other Than Ping

DNS response is already separated from the ping performance. Does one want the DNS performance and if so how does one get it, e.g. which name servers is one measuring the response to etc. It was agreed that this should be a separate future project.

Getting http get response times is a future goal.

Quality of service related data is a future goal.

Traceroute has 2 contexts:

An Overview of NIMI (National Internet Measurement Infrastructure) - Vern Paxson, LBNL

NIMI is a collaboration between LBNL & PSC. It is a NSF funded (for 1 year) pilot project that is also supported by DOE. It includes the development of tools with the goal of widespread deployment of measurement platforms for diagnosing performance problems, baseline measurements to see how traffic evolves, assessing performance delivered by IP clouds. The general model is to have platforms situated at the borders of clouds and measuring the traffic through the clouds with an N**2 type measurement. The original deployment for Vern's thesis included about 20 sites in Europe, U.S. & Asia (~ 380 pairs).

The design goals & constraints are:

Making a measurement requires basic requests:

Security model:

Auto-configuration goal is to keep platforms lean & mean:

Hardware/software platform

Status

Poisson Ping ("Poip")

Future

Meaty Research Issues

What Analysis & visualization capabilities are needed?

After some discussion it was agreed that it may be premature to define new capabilities, rather it might be more rewarding to see what exists and try it out with the data XIWG are going to gather.

What does SLAC/HEP do?

Time averaging:

Metrics

Cuts

Reports

Intel Demonstration

Strong time of day (e.g. night better than day) dependencies indicate queuing and that the link is in need of improvement.

Compare customer expectations for metric and it variability versus the actual measurements, this is related to something known as the Cpk (pk = process capability). Cindy Bickerstaff of Intel will provide information on this, what it requires and how to calculate it.

Keynote

This is a two & half years old start up with 18 people. Mission is to measure Internet performance. Provide raw data & analysis. Flagship product is Keynote Perspective. They basically measure the response time to get an URL, repeating the measurement every 15 minutes. Find order of magnitude difference in time to download a Web page, i.e. a given page at a customer site is downloaded from multiple (39-40) Keynote locations. Keynote locations in a given city may have multiple ISP connections. The Keynote boxes are located at Web collocation sites with national backbone connections (to eliminate problems caused by the monitoring site having a poor connection to the Internet) and remotely managed. They provide a plot of y = time, x = measuring site and sort to see which are worst over some time frame. They are seeing an increase in sites with connections to multi ISPs. The ISPs are reselling Keynote measurement services and so are willing to collocate Keynote sites. They also work with Keynote customers to identify the effect of changes (e.g. change the peering). The alarms also add traceroute information. They attempt to identify the effects of the LAN and server/database performance and separate it from the network performance by monitoring response time locally and remotely.

Benefits:

Service Delivery Options:

Monthly Service Subscription:

The customers want a single easily understandable quality metric that describes the user experience. Thus they are confused when people discuss detailed meanings of statistics, so using terms like medians and inter-quartile distances does not help understanding. Keynote today uses simple averages and standard deviations. site. This running mainly includes ensuring the data has been gathered and archived successfully and helping new collection sites get started. The ESnet community has not seen a need for multiple publicly visible archive sites. A reason for such multiple redundant sites might be for faster access to the web

Discussions

What are the requirements for the archive site. According to Dave Martin of HEPNRC, probably about 25% of an FTE is consumed just running the ESnet archive. This FTE percentage includes helping new collection sites get started, and ensuring that the data is accurately gathered in a timely fashion. Other costs include the hardware costs and the SAS license.

Is there a requirement for multiple connection sites? ESnet looked at this and decided not to do this, it did not seem to provide sufficient benefits to justify the extra costs. The benefits could allow faster access to the Web pages from different parts of the world by mirroring data, or for load sharing. Neither is compelling at the moment. It may be useful to have an additional (non public, and not necessarily complete or current) copy of the data at a site which is developing analysis tools.

CNRI will try and get the archive site running by the new year. They hope to get a Sun/Solaris with a SAS license.

The members agreed to share the data / information gathered amongst themselves. They may make some of the information publicly available after suitably making it anonymous.

A small subgroup of 3 people agreed to look at instrumenting Web servers (e.g. by providing extra information in the log) to provide better passive measurements of performance. Cindy Bickerstaff will head up the group.

There was also interest in setting up a subgroup to look at what metrics are needed to set up Service Level Agreements (SLAs) with ISPs. For example, ISPs will probably not accept ping measurements to things they do not control, e.g. to a customer site host.

Are there any controlled experiments that can be made by perturbing the system and looking at the effect?

Goals of next meeting

Have collection sites running. Share experiences. Discuss how to analyze and visualize. Intel is interested in analyzing the data with some of their tools which they are working with lawyers to make public.

Have archive site running.

Time of Next Meeting

December is too soon. Late January was proposed. There is an XIWT meeting in Austin in early February (3rd and 4th) that some members of this group (XIWG/IPWT) will attend. Another possibility would be to do it in San Diego, Tracie Monk offered to host a meeting in San Diego as long as somebody will sponsor it to cover the refreshments etc. costs. It was agreed to make it in Austin on the 2nd of February.

There should be a conference call the week of January 14th at 1pm EST. Try to put presentation materials on the Web before the conference call.