Question 1: Please
provide clarification and details of the research issues intended to be
addressed in this project.
This proposal provides an infrastructure to enable network measurements. There are research components both to the provision of the infrastructure and the extension and analysis of measurements made.
Research related to the infrastructure provision includes:
· How to provide fine-grained access control in order to control consumption of resources on a per-NIMI probe, per-user basis.
· How to build measurement tool resource profiles to allow the scheduler to check for competing resources and compensate accordingly.
· Measurement expiration, e.g. how long to attempt to deliver a measurement, how long to store on local disk, as well as more granular expirations.
Some research issues involved with the extension and analysis of the measurements include:
· Improving the understanding of the prevalence of bursty packet loss on the Internet. This is increasingly important for real-time applications such as interactive voice over IP where bursty loss has a much greater impact than flat random losses. New methods of measuring and estimating this will be developed, deployed, reported on and evaluated.
· Measuring, and understanding the prevalence of spurious packets (e.g. duplicates), and packet re-ordering.
· We will research ways to estimate packet-delay variability or "jitter" and compare and contrast various estimators for consistency, domain of applicability and computational efficiency.
· We will research how to quickly extract measurement results from the data in near realtime for the purposes of feeding back to applications (such as bbcp) the optimal strategy for improving performance while limiting the impact on others.
·
How well do current Internet performance estimators
(e.g. RTT, loss, jitter, reachability) scale to high performance networks,
real-time reporting requirements (e.g. for network aware applications), what
new metrics/estimators are needed.
· Some research issues involved with the extension and analysis of the measurements include: Today the IP Multicast Beacon displays realtime reachability and loss information across some 60 sites. ANL will develop methods of saving historic multicast relevant data and correlating network events based on this information. Additional topological information would be useful to the debugging process. Methods for detecting changes in topology will be investigated
· Assessing the components that go into end-to-end network path throughput: the interplay between delay, delay variation, and packet loss, per the TCP throughput equations.
Question 2: List the
specific DOE/University sites that have agreed to host NIMI monitors and
include letters of support from these projects.
We propose to supply NIMI platforms for 10 sites during the first 2 years of the project. We expect other sites to also join, supplying their own hardware to support the NIMI platform. Give the close collaboration between these sites and the AIME lead sites we do not anticipate problems with getting agreement to place NIMI probes at these sties.
Below is a brief list of the candidate sites and their relationship to the project.
· SLAC, LBNL and PSC: lead AIME sites, that already have at least one NIMI probe.
· ANL, UTK: both AIME collaborator sites.
· Caltech, FNAL, BNL, JLab, SDSC, GATech and Wisconsin: sites that are currently collaborating with SLAC on PingER monitoring at PPDG sites. These sites are also collaborating with SLAC on other measurement projects. Details of which are listed below.
· ORNL, LANL and Rice University: collaborators on other SciDAC proposals. Details are provided below.
· International sites: IN2P3 in Lyon France, Rome, Rutherford and Daresbury Laboratory in England, and CERN.
More detailed information on projects and collaborations for these sites is given below.
· SLAC has already deployed PingER monitoring at PPDG sites with the active collaboration of the PPDG sites. The deployment of NIMIs at these sites is a natural extension to tighter granularity and more details required for high performance support. The PPDG collaboration will enable placement of further NIMI probes at Caltech, FNAL, BNL, JLab, SDSC and Wisconsin.
· Caltech is also a major collaborator with SLAC for the BaBar project. Similarly, FNAL, BNL and JLab have large high throughput performance issues for running or future experiments so we expect them to be anxious to utilize the support to be provided by AIME.
· SLAC is collaborating with other SciDAC proposals (see section 3.2.5) with Rice University and LANL (INCITE), ORNL (Statistical Analysis), and SDSC/CAIDA (bandwidth estimation) so there are mutual benefits to placing NIMI probes at these sites.
· IN2P3 in Lyon France is a tier A remote computing site for BaBar (located at SLAC), and RAL is proposing to become such a site. Each of these sites has requirements to copy large amounts of data between SLAC and their site. These amounts start out a a few tens of Mbits/second continuous (for 1/3 of a year), growing by factors of two per year to hundreds of Mbits/second. In addition the INFN/Rome site is a tier B BaBar remote computing site and also has large bulk throughput requirements. SLAC has been very actively working with each of these sites to measure and understand bulk throughput performance and improve it. We believe that the deployment of AIME probes at these sites will facilitate the measuring and understanding of network performance and greatly assist in optimizing bulk throughput.
· The LHC project at CERN which will start to take data in 2005 has massive bulk throughput requirements. They are also part of the PingER project, and SLAC has very strong contacts with Olivier Martin of CERN who leads the WAN group there.
· LLNL is a collaborator with SLAC on the PEPII/Babar project and is a BaBar remote site for Monte Carlo simulations. As such it has high throughput requirements between LLNL and SLAC.
· GATech is a collaborator with SLAC on the SciDAC OPTS proposal.
Candidate AIME probe sites
Site |
Possible contacts |
Interest |
Deployment year/ sequence priority |
SLAC |
Les Cottrell |
AIME |
0 |
LBNL |
Vern Paxson |
AIME |
0 |
PSC |
Andrew Adams |
AIME |
0 |
ANL |
Bill Nickless |
AIME |
0.5 |
Caltech |
Harvey Newman |
PPDG, Babar, PingER, BaBar |
1 |
FNAL |
Frank Nagy, Ruth Pordes, Phil deMar, |
PPDG, PingER |
1 |
SDSC |
Regan Moore, kc claffy |
PPDG, measurement/CIADA |
1 |
UTK |
Rich Wolski |
AIME |
0.5 |
BNL |
Mike O’Connor |
PPDG, PingER |
2 |
ORNL |
Nagy Rao, Bill Wing |
CEENPAR & Stats proposal |
2 |
RAL or Daresbury |
Peter Clarke, Robin Tasker, Paul Kummer |
QoS proposal, Babar, PingER |
2 |
Rice |
Richard Baraniuk |
INCITE proposal |
2.5 |
IN2P3 |
Gilles Farache, Dominique Boutigny |
BaBar |
2.5 |
INFN |
|
BaBar, PingER |
2.5 |
CERN |
Olivier Martin |
LHC, PingER |
3 |
JLab |
Chip Watson |
PPDG |
3 |
Wisconsin |
Myron Livny |
PPDG |
3 |
LANL |
Wu-chun Feng |
INCITE proposal |
4 |
Gatech |
George Riley |
OPTS proposal |
4 |
LLNL |
|
BaBar |
4 |
Notes:
Deployment year = 0 indicates the probe is already in place
There are more potential sites than there are funded probes, so we do not anticipate problems in place the probes.
Actual sequencing will depend on factors such as funding of
related proposals, ease of installation, interest from remote site, progress on
measurements etc.
Question 3: Will the
outcome of this project be beneficial to mostly network planners or scientists
using the network? Please state how.
The project benefits both network planners and scientists.
· Network planners will benefit from the provision and ready availability of long (up to several years) and short term (near real time) estimates of network performance including round trip times, losses, jitter, spurious packets and reordering prevalence, reachability, throughput, and routing. This will be valuable for trouble-shooting, problem isolation (e.g. identifying when something changed and how), planning, setting expectations, setting and evaluating service level agreements, providing input for performance prediction/tuning for network aware applications..
· Scientists will benefit: from being provided with more realistic expectations of past, current and future performance; from improved network aware applications for bulk throughput such as bbcp that are better able to self configure and more agile in responding to network changes; from faster identification and remediation of performance problems.
Question 4: The
extension of NIMI capability described in section 5.1 does not appear to be
part of the deliverables from any of the participating PIs. Also missing are
the deliverables associated with sections 5.1, 5.2, 5.3, 5.4 and 5.5. for each
of the three years.
First year
deliverables:
As indicated in section 5.1 and 5.6.1 SLAC will report on research, comparing and contrasting various ways of estimating “jitter” or inter-packet delay variability, implementation of one or more of the promising candidates into NIMI and PingER and verifying the conclusions. SLAC will also report on estimators for bursty packet loss. The first version of the new user interface to NIMI analysis and reports will be put into production. This will include reports on RTT, loss, reachability, spurious (e.g. duplicate) packets and reordering and “jitter”.
SLAC will present early results on an instrumented high performance throughput application (bbcp) that uses NIMI & PingER results to adjust to network conditions in real time. The application will be put into production for Babar.
As indicated in section 5.5, PSC
will extend the NIMI probes to 5 sites during Year 1 of the project. Deployment
to these sites will include initial
installation and configuration of the systems. During Year 1 these systems will continue to be mainatined and
administered by the AIME/NIMI team.
PSC, in collaboration with Vern
Paxson, will also begin work on the new resource control tools outlined in
section 5.3. During Year 1 PSC will
begin working on developing and implementing the fine-grained
access control for the NIMI
platforms. We plan to have an initial version of this implemented and deployed
on the NIMI platforms by the end of
Year 1. We will also begin to
understand the issues associated
with supporting inter-domain
monitoring and defining measurement
tool resource profiles.
Prototypes for both will be defined and tested by the end of year
1. We will begin development on implementing
generic packet filters.
ANL will develop and deploy the first version of the Beacon tools integrated into the NIMI. Work will begin on storing and retrieving historic beacon data.
Second year
deliverables:
SLAC will extend the NIMI measurement tools to include IPv6 monitoring support, bandwidth estimators, possibly cross-traffic estimators. A second version of the web user interface to NIMI analysis and reports will be deployed, new estimators reported on will include bursty packet loss, and bandwidth estimation. A first version of the web user interface to the traceroute measurements will be developed and put into production. A first version of a web user interface to the Beacon data will be made available. If the LBNL passive monitoring is funded, then first results on the effectiveness of tieing together passive and active measurements will be reported on.
A first version of some NIMI results (e.g. RTT, loss, “jitter”, bandwidth estimation) will be made available by SLAC via a documented interface (e.g. Grid Monitoring Service Architecture). Bbcp will be extended to utilize information from NWS to assist in adjusting to network conditions.
UTK will extend NWS to utilize the NIMI data for forecasting.
ANL
will integrate a production version of the Beacon tools into NIMI. Additional features will be added to the beacon to determine
multicast topology information.
As indicated in section 5.5, PSC
will continue to deploy NIMI probes at 5 additional sites, including some
international HENP sites. We will
continue to maintain and administer these platforms during the
first half of the year. Assuming that the NIMI infrastructure is
robust and stable, by the end of Year 2 we will have begun the process of establishing a separate ESnet NIMI partition
with one or more logical
administrative domain. We will continue to work on the development
and deployment of the resource control mechanisms outlined in Section 5.3.
Specifically, we will finish the development and deployment of the inter-domain monitoring and measurement tool
resource profiles. We will begin
implementing measurement expirations.
Third year deliverables:
SLAC will make NIMI traceroute measurements available publicly via a standard API. Results from the NIMI Beacon measurements will be made available. A recommendation will be provided concerning project continuation.
PSC will continue to provide any
needed support for the ESnet NIMI partition.
While we don't plan to directly deploy any new NIMI platforms, we will
work with sites that wish to deploy their own platforms. We will finish up the development and
deployment of the NIMI resource control features outlined in section 5.3,
focusing on fixing any bugs and fine tuning the interaction of these tools with
each other as well as the rest of the system.
ANL will add performance measurement features to the Beacon tools. This will allow stress testing of multicast infrastructures. Presently, the beacon only transmits a very low level of multicast traffic.
Question 5: Since Rich
Wolski of the University of Tennessee is not a funded PI, he may not be
compelled to contribute to the critical work necessary for the success of the
project. Similarly the contributions of Linda Winkler and Vern Paxson are also
questionable.
Linda
Winkler is listed in the Current and Pending Support section of the proposal as
being funded from the current AIME
proposal at the level of 9% or $45K over the total project. Linda Winkler continues to push IP
Multicast deployment as part of the
Access Grid project. Her current efforts are international deployment of
the multicast beacon in support of the
SC Global project. This AIME
project will provide very useful insite
into the types of debugging and
tracking information that are critical
to be gathered. She is also very
involved in ESCC, ESSC and ESRC
activities. She will work with
ESnet staff and sites to implement and
enhance their multicast infrastructure
through better tools and measurement capabilities.
Rich Wolski provided a letter of intent to collaborate in
this proposal which was included at the end (Section 11.4). The provision of
data from the NIMIs to the NWS will provide the NWS/Rich Wolski with a rich new
source of information to provide new methods of forecasting performance. Thus
Rich is very anxious to get the data and work on understanding and integrating
into NWS.This is part of a separate SciDAC proposal (Net100) that Rich Wolski
is part of that will be providing a
measurement and
forecasting information base that will be valuable in the AIME context. Indeed, the reason Rich did not ask for funding from the AIME proposal was because there was going to be so much overlap between what he was proposing for Net100 and the AIME project, it didn't make sense to apply for double funding. If the NWS/Net100 is not funded or the NWS decide it is not in its interests to use our data to make forecasts, then we will focus on other ways to provide forecasts from the NIMI measurements. In particular, our close collaboration with a key PPDG application (the BaBar Copy program bbcp that provide high performance bulk throughput) developer Andy Hanushevsky, located at SLAC, provides us with the ability to either tie in the NWS forecasts into bbcp and/or use the direct NIMI measurements to facilitate improving bulk throughput. We have already started putting the hooks into bbcp to facilitate providing it with network predictive information from NIMI or NWS or other sources.
Dr. Paxson is deeply
committed to the NIMI project, having worked on it for the last five
years. As discussed in the PAM-2000
paper he and his NIMI colleagues wrote last year ("Experiences with
NIMI"), the key problem of scaling the infrastructure up in a sound
fashion remains. Deployment on ESNET
offers a terrific opportunity for doing so, because it allows scaling in one
dimension (size) while holding scaling in another dimension (administrative
heterogeneity) constant; so the problem of scaling to the next stage can be
done in multiple steps rather than as one large step. Furthermore, Dr. Paxson's continued affiliation with LBNL means
he has direct access to, and trust relationships with, ESNET staff, which will
be of great practical
help with the deployment and operation.
Question 6: Will ESnet have is own NIMI infrastructure or will it be part of the global NIMI infrastructure? If so has ESnet agreed to deploy NIMI boxes in its core?
The design is that ESnet laboratory high-performance sites will have NIMI hosts. Initially these will be part of the global NIMI infrastructure administered by PSC (see Section 5.5). Eventually as the NIMI architecture improves, the system should be robust enough to enable partitioning the infrastructure into one or more logical administration domains where each domain would then be responsible for administering the NIMI probes contained within. The creation of a separate ESnet and/or HENP domain is in keeping with NIMI's fundamental design goal to support administratively infrastructures. In the third year of the project as part of the evaluation of providing a smooth transition to a production service we will explore whether to create other NIMI administrative domains (e.g. for ESnet, the PPDG and/or HENP) and plan to engage ESnet in these discussions and planning. We believe it is currently premature to start serious discussions with ESnet on this, since the full ramifications and the supporting infrastructure are still under development.
The deployment of NIMIs is expected to be at the national Laboratory ingress/egress sites and not within the ESnet core, since the NIMIs will be making end-to-end performance measurements. It may also prove desirable to place NIMIs at ESnet peering edges, to facilitate pin-pointing performance issues. This is not currently part of the proposal, however, it is a natural extension for the second or third year. The NIMI design allows remote administration and so enables such placement with very limited impact on ESnet apart from a small amount of collocation space. It will of course need to discussed with Esnet management folks.
Given the close working relationship and frequent interactions of some of the AIME participants with ESnet (e.g. Vern Paxson spends time at LBNL, Les Cottrell is the chair of the ESnet Network Monitoring Task Force, Linda Winkler, Bill Nickless and Les Cottrell are all frequent contributors at various ESCC and ESSC meetings), the limited requirements from ESnet to support the placement of NIMIs at ESnet peering points, and the mutual benefits, we anticipate that such discussions should be very constructive.
Linda is also very involved in ESCC,
ESSC and ESRC activities. She will work with ESnet staff and sites to implement and enhance their
multicast infrastructure through better
tools and measurement capabilities.
We also plan to explore sharing the NIMI probes with the LBNL passive monitoring proposal assuming both projects are funded. We wish to work closely with them and sharing the resources could be a good way to encourage collaboration.