Question 1: Please provide clarification and details of the research issues intended to be addressed in this project.

 

This proposal provides an infrastructure to enable network measurements. There are research components both to the provision of the infrastructure and the extension and analysis of measurements made.

Research related to the infrastructure provision includes:

·                     How to provide fine-grained access control in order to control consumption of resources on a per-NIMI probe, per-user basis.

·                     How to build measurement tool resource profiles to allow the scheduler to check for competing resources and compensate accordingly.

·                     Measurement expiration, e.g. how long to attempt to deliver a  measurement, how long to store on local disk, as well as more granular expirations.

 

Some research issues involved with the extension and analysis of the measurements include:

·                     Improving the understanding of the prevalence of bursty packet loss on the Internet. This is increasingly important for real-time applications such as interactive voice over IP where bursty loss has a much greater impact than flat random losses. New methods of measuring and estimating this will be developed, deployed, reported on and evaluated.

·                     Measuring, and understanding the prevalence of spurious packets (e.g. duplicates), and packet re-ordering.

·                     We will research ways to estimate packet-delay variability or "jitter" and compare and contrast various estimators for consistency, domain of applicability and computational efficiency.

·                     We will research how to quickly extract measurement results from the data in near realtime for the purposes of feeding back to applications (such as bbcp) the optimal strategy for improving performance while limiting the impact on others.

·                     How well do current Internet performance estimators (e.g. RTT, loss, jitter, reachability) scale to high performance networks, real-time reporting requirements (e.g. for network aware applications), what new metrics/estimators are needed.

·                     Some research issues involved with the extension and analysis of the  measurements include: Today the IP Multicast Beacon displays realtime reachability and loss  information across some 60 sites.  ANL will develop methods of saving historic  multicast relevant data and correlating network events based on this information. Additional topological information would be useful to the debugging  process.  Methods for detecting changes in topology will be investigated

·                     Assessing the components that go into end-to-end network path throughput: the interplay between delay, delay variation, and packet loss, per the TCP throughput equations.

 

Question 2: List the specific DOE/University sites that have agreed to host NIMI monitors and include letters of support from these projects.

 

We propose to supply NIMI platforms for 10 sites during the first 2 years of the project. We expect other sites to also join, supplying their own hardware to support the NIMI platform.  Give the close collaboration between these sites and the AIME lead sites we do not anticipate problems with getting agreement to place NIMI probes at these sties.

 

Below is a brief list of the candidate sites and their relationship to the project.

 

·         SLAC, LBNL and PSC: lead AIME sites, that already have at least one NIMI probe.

·         ANL, UTK: both AIME collaborator sites.

·         Caltech, FNAL, BNL, JLab, SDSC, GATech and Wisconsin: sites that are currently collaborating with SLAC on PingER monitoring at PPDG sites. These sites are also collaborating with SLAC on other measurement projects. Details of which are listed below.

·         ORNL, LANL and Rice University: collaborators on other SciDAC proposals. Details are provided below.

·         International sites: IN2P3 in Lyon France, Rome, Rutherford and Daresbury Laboratory in England, and CERN.

 

More detailed information on projects and collaborations for these sites is given below.

 

·                     SLAC has already deployed PingER monitoring at PPDG sites with the active collaboration of the PPDG sites. The deployment of NIMIs at these sites is a natural extension to tighter granularity and more details required for high performance support. The PPDG collaboration will enable placement of further NIMI probes at Caltech, FNAL, BNL, JLab, SDSC and Wisconsin.

·                     Caltech is also a major collaborator with SLAC for the BaBar project. Similarly, FNAL, BNL and JLab have large high throughput performance issues for running or future experiments so we expect them to be anxious to utilize the support to be provided by AIME.

·                     SLAC is collaborating with other SciDAC proposals (see section 3.2.5) with Rice University and LANL (INCITE), ORNL (Statistical Analysis), and SDSC/CAIDA (bandwidth estimation)  so there are mutual benefits to placing NIMI  probes at these sites. 

·                     IN2P3 in Lyon France is a tier A remote computing site for BaBar (located at SLAC), and RAL is proposing to become such a site. Each of these sites has requirements to copy large amounts of data between SLAC and their site. These amounts start out a a few tens of Mbits/second continuous (for 1/3 of a year), growing by factors of two per year to hundreds of Mbits/second.  In addition the INFN/Rome site is a tier B BaBar remote computing site and also has large bulk throughput requirements. SLAC has been very actively working with each of these sites to measure and understand bulk throughput performance and improve it. We believe that the deployment of AIME probes at these sites will facilitate the measuring and understanding of network performance and greatly assist in optimizing bulk throughput.

·                     The LHC project at CERN which will start to take data in 2005 has massive bulk throughput requirements. They are also part of the PingER project, and SLAC has very strong contacts with Olivier Martin  of  CERN who leads the WAN group there.

·                     LLNL is a collaborator with SLAC on the PEPII/Babar project and is a BaBar remote site for Monte Carlo simulations. As such it has high throughput requirements between LLNL and SLAC.

·                     GATech is a collaborator with SLAC on the SciDAC OPTS proposal.

 

Candidate AIME probe sites

Site

Possible contacts

Interest

Deployment year/ sequence priority

SLAC

Les Cottrell

AIME

0

LBNL

Vern Paxson

AIME

0

PSC

Andrew Adams

AIME

0

ANL

Bill Nickless

AIME

0.5

Caltech

Harvey Newman

PPDG, Babar, PingER, BaBar

1

FNAL

Frank Nagy, Ruth Pordes, Phil deMar,

PPDG, PingER

1

SDSC

Regan Moore, kc claffy

PPDG, measurement/CIADA

1

UTK

Rich Wolski

AIME

0.5

BNL

Mike O’Connor

PPDG, PingER

2

ORNL

Nagy Rao, Bill Wing

CEENPAR & Stats proposal

2

RAL or Daresbury

Peter Clarke, Robin Tasker, Paul Kummer

QoS proposal, Babar, PingER

2

Rice

Richard Baraniuk

INCITE proposal

2.5

IN2P3

Gilles Farache, Dominique Boutigny

BaBar

2.5

INFN

 

BaBar, PingER

2.5

CERN

Olivier Martin

LHC, PingER

3

JLab

Chip Watson

PPDG

3

Wisconsin

Myron Livny

PPDG

3

LANL

Wu-chun Feng

INCITE proposal

4

Gatech

George Riley

OPTS proposal

4

LLNL

 

BaBar

4

Notes:   

Deployment year = 0 indicates the probe is already in place

There are more potential sites than there are funded probes, so we do not anticipate problems in place the  probes.

Actual sequencing will depend on factors such as funding of related proposals, ease of installation, interest from remote site, progress on measurements etc.

 

Question 3: Will the outcome of this project be beneficial to mostly network planners or scientists using the network? Please state how.

 

The project benefits both network planners and scientists.

·                     Network planners will benefit from the provision and ready availability of  long (up to several years) and short term (near real time) estimates of network performance including round trip times, losses, jitter, spurious packets and reordering prevalence, reachability, throughput, and routing. This will be valuable for trouble-shooting, problem isolation (e.g. identifying when something changed and how), planning, setting expectations, setting and evaluating service level agreements, providing input for performance prediction/tuning for network aware applications..

·                     Scientists will benefit: from being provided with more realistic expectations of past, current and future performance; from improved network aware applications for bulk throughput such as bbcp that are better able to self configure and more agile in responding to network changes; from faster identification and remediation of performance problems.

 

Question 4: The extension of NIMI capability described in section 5.1 does not appear to be part of the deliverables from any of the participating PIs. Also missing are the deliverables associated with sections 5.1, 5.2, 5.3, 5.4 and 5.5. for each of the three years.

 

First year deliverables:

As indicated in section 5.1 and 5.6.1  SLAC will report on research, comparing and contrasting various ways of estimating “jitter” or inter-packet delay variability, implementation of one or more of the promising candidates into NIMI and  PingER and verifying the conclusions. SLAC will also report on estimators for bursty packet loss. The first version of the new user interface to NIMI analysis and reports will be put into production. This will include reports on RTT, loss, reachability, spurious (e.g. duplicate) packets and reordering and “jitter”.

 

SLAC will present early results on an instrumented high performance throughput application (bbcp) that uses NIMI & PingER results to adjust to network conditions in real time. The application will be put into production for Babar.

 

As indicated in section 5.5, PSC will extend the NIMI probes to 5 sites during Year 1 of the project. Deployment to these sites will include initial  installation and configuration of the systems. During Year 1 these systems  will continue to be mainatined and administered by the AIME/NIMI team.

PSC, in collaboration with Vern Paxson, will also begin work on the new resource control tools outlined in section 5.3.  During Year 1 PSC will begin working on developing and implementing the fine-grained

access control for the NIMI platforms. We plan to have an initial version of this implemented and deployed on the NIMI platforms by  the end of Year 1.  We will also begin to understand the issues associated

with supporting inter-domain monitoring and defining measurement  tool resource profiles.  Prototypes for both will be defined and tested by the end of year 1.  We will begin development on implementing generic packet filters.

     

ANL will develop and deploy the first version of the Beacon tools integrated into the NIMI. Work will begin on storing and retrieving historic beacon data.

 

Second year deliverables:

SLAC will extend the NIMI measurement tools to include IPv6 monitoring support, bandwidth estimators, possibly cross-traffic estimators.  A second version of the web user interface to NIMI analysis and reports will be deployed, new estimators reported on will include bursty packet loss, and bandwidth estimation.  A first version of the web user interface to the traceroute measurements will be developed and put into production. A first version of a web user interface to the Beacon data will be made available. If the LBNL passive monitoring is funded, then first results on the effectiveness of tieing together passive and active measurements will be reported on.

 

A first version of some NIMI results (e.g. RTT, loss, “jitter”, bandwidth estimation) will be made available by SLAC via a documented interface (e.g. Grid Monitoring Service Architecture). Bbcp will be extended to utilize information from NWS to assist in adjusting to network conditions.

 

UTK will extend NWS to utilize the NIMI data for forecasting.

 

ANL will integrate a production version of the Beacon tools  into NIMI. Additional features will be added to the beacon to determine multicast topology information.

 

As indicated in section 5.5, PSC will continue to deploy NIMI probes at 5 additional sites, including some international HENP sites.  We will continue to maintain and administer these platforms during the

first half of the year.  Assuming that the NIMI infrastructure is robust and stable, by the end of Year 2 we will have begun the process of  establishing a separate ESnet NIMI partition with one or more logical

administrative domain.  We will continue to work on the development and deployment of the resource control mechanisms outlined in Section 5.3. Specifically, we will finish the development and deployment of the  inter-domain monitoring and measurement tool resource profiles.  We will begin implementing measurement expirations. 

 

Third year deliverables:

SLAC will make NIMI traceroute measurements available publicly via a standard API. Results from the NIMI Beacon measurements will be made available. A recommendation will be provided concerning project continuation.

 

PSC will continue to provide any needed support for the ESnet NIMI partition.  While we don't plan to directly deploy any new NIMI platforms, we will work with sites that wish to deploy their own platforms.  We will finish up the development and deployment of the NIMI resource control features outlined in section 5.3, focusing on fixing any bugs and fine tuning the interaction of these tools with each other as well as the rest of the system.

 

ANL will add performance measurement features to the Beacon tools.  This will allow stress testing of multicast infrastructures.  Presently, the beacon only transmits a very low level of multicast traffic.

 

Question 5: Since Rich Wolski of the University of Tennessee is not a funded PI, he may not be compelled to contribute to the critical work necessary for the success of the project. Similarly the contributions of Linda Winkler and Vern Paxson are also questionable.

 

Linda Winkler is listed in the Current and Pending Support section of the proposal as being  funded from the current AIME proposal at the level of 9% or $45K over the total project. Linda Winkler continues to push IP Multicast deployment as part of the  Access Grid project. Her current efforts are international deployment of the multicast beacon in  support of the SC Global project.  This AIME project  will provide very useful insite into the types of  debugging and tracking information  that are critical to be gathered.  She is also very involved in ESCC, ESSC and ESRC  activities.  She will work with ESnet staff  and sites to implement and enhance their multicast  infrastructure through better tools and measurement capabilities.

 

 

Rich Wolski provided a letter of intent to collaborate in this proposal which was included at the end (Section 11.4). The provision of data from the NIMIs to the NWS will provide the NWS/Rich Wolski with a rich new source of information to provide new methods of forecasting performance. Thus Rich is very anxious to get the data and work on understanding and integrating into NWS.This is part of a separate SciDAC proposal (Net100) that Rich Wolski is part of that will be providing a measurement and

forecasting information base that will be valuable in the AIME context. Indeed, the reason Rich did not ask for funding from the AIME proposal was because there was going to be so much overlap between what he was proposing for Net100 and the AIME project, it didn't make sense to apply for double funding.  If the NWS/Net100 is not funded or the NWS decide it is not in its interests to use our data to make forecasts, then we will focus on other ways to provide forecasts from the NIMI measurements. In particular, our close collaboration with a key PPDG application (the BaBar Copy program bbcp that provide high performance bulk throughput) developer Andy Hanushevsky, located at SLAC,  provides us with the ability to either tie in the NWS forecasts into bbcp and/or use the direct NIMI measurements to facilitate improving bulk throughput. We have already started putting the hooks into bbcp to facilitate providing it with network predictive information from NIMI or NWS or other sources.

 

Dr. Paxson is deeply committed to the NIMI project, having worked on it for the last five years.  As discussed in the PAM-2000 paper he and his NIMI colleagues wrote last year ("Experiences with NIMI"), the key problem of scaling the infrastructure up in a sound fashion remains.  Deployment on ESNET offers a terrific opportunity for doing so, because it allows scaling in one dimension (size) while holding scaling in another dimension (administrative heterogeneity) constant; so the problem of scaling to the next stage can be done in multiple steps rather than as one large step.  Furthermore, Dr. Paxson's continued affiliation with LBNL means he has direct access to, and trust relationships with, ESNET staff, which will

be of great practical help with the deployment and operation.

 

Question 6: Will ESnet have is own NIMI infrastructure or will it be part of the global NIMI infrastructure? If so has ESnet agreed to deploy NIMI boxes in its core?

 

The design is that ESnet laboratory high-performance sites will have NIMI hosts. Initially these will be part of the global NIMI infrastructure administered by PSC (see Section 5.5). Eventually as the NIMI architecture improves, the system should be robust enough to enable partitioning the infrastructure into one or more logical administration domains where each domain would then be responsible for administering the NIMI probes contained within. The creation of a separate ESnet and/or HENP domain is in keeping with NIMI's fundamental design goal to support administratively infrastructures. In the third year of the project as part of the evaluation of providing a smooth transition to a production service we will explore whether to create other NIMI administrative domains (e.g. for ESnet, the PPDG and/or HENP) and plan to engage ESnet in these discussions and planning. We believe it is currently premature to start serious discussions with ESnet on this, since the full ramifications and the supporting infrastructure are still under development.

 

The deployment of NIMIs is expected to be at the national Laboratory ingress/egress sites and not within the ESnet core, since the NIMIs will be making end-to-end performance measurements. It may also prove desirable to place NIMIs at ESnet peering edges, to facilitate pin-pointing performance issues. This is not currently part of the proposal, however, it is a natural extension for the second or third year. The NIMI design allows remote administration and so enables such placement with very limited impact on ESnet apart from a small amount of collocation space. It will of course need to discussed with Esnet management folks.

 

Given the close working relationship and frequent interactions of some of the AIME participants with ESnet (e.g. Vern Paxson spends time at LBNL, Les Cottrell is the chair of the ESnet Network Monitoring Task Force, Linda Winkler, Bill Nickless and Les Cottrell are all frequent contributors at various ESCC and ESSC meetings), the limited requirements from ESnet to support the placement of NIMIs at ESnet peering points, and the mutual benefits, we anticipate that such discussions should be very constructive.

 

Linda is also very involved in ESCC, ESSC and ESRC  activities.  She will work with ESnet staff  and sites to implement and enhance their multicast  infrastructure through better tools and measurement capabilities.

 

We also plan to explore sharing the NIMI probes with the LBNL passive monitoring proposal assuming both projects are funded. We wish to work closely with them and sharing the resources could be a good way to encourage collaboration.