DOE Office of Science Notice 01-06
Collaboratory Pilot: High Performance Networks
Title of Proposed Project:
Optimizing Performance of Throughput by Simulation (OPTS)
Principal Investigator:
Les Cottrell, (650)926-2523, FAX (650)926-3329, <cottrell@slac.stanford.edu>
Stanford Linear Accelerator Center (SLAC), MS97, 2575 Sand Hill Rd., Menlo Park, California 94025
Key members of team:
Warren Matthews, Andy Hanushevsky, + student, SLAC
George Riley + student, Georgia Institute of Technology
Rich Wolski, University of Tennessee at Knoxville
Submitted to:
High Performance Networks Research Program, Mathematical, Information and Computational Sciences Division,
Office of Advanced Scientific Research, U. S. Department of Energy, 19901 Germantown Rd, Germantown, MD 20874-1207;
Summary description of proposed research
Many of today’s networks and emerging networks enable increased levels of collaboration for DoE scientists and their collaborators. At the same time there is an increasing need to replicate and access large scale (TeraByte and beyond) databases from anywhere on the network. Providing high performance access today, however, requires a thorough understanding of many technologies including understanding how congestion is managed, what metrics determine the performance, what are the metrics for the paths and even what are the paths involved. As a result most scientists are unaware of what is possible, what are the expectations and constraints, and how to address these issues.
We propose to make bulk throughput performance and other (e.g. round trip times, bottleneck bandwidth) measurements on a variety of network paths of interest to the ESnet community. The results of these performance measurements will be then used to construct realistic simulations of the actual network paths, using the popular and widely used ns-2 network simulator. These observed measurements and simulation results will be used to validate the range, level of detail etc, of agreement with the ns-2 simulator, and to understand the interaction of the performance with various metrics. This information will be fed back to the ns-2 developer community to assist in identifying where improvements are needed. Having understood the validity of the simulator it will be used to assist users in understanding today’s poor performance, providing guidance on settings required to improve performance, setting expectations of what is possible, and predicting improvements expected from upgrades. In addition we believe we can use the simulator to identify where the bulk throughput transfer saturates the network and even over-runs (i.e. increases the offered traffic without increasing the throughput) the network. With this information we expect to be able to provide guidelines for setting the bulk-throughput application parameters so as to also limit the impact on other users and to provide better performance of the bulk-throughput. Feedback from these activities will in turn be used to provide an easy to use front end to the simulator, and simplified distribution techniques to enable wider use of the simulator for tuning etc. We will work with some parallel bulk throughput application developers to explore how to improve the applications based on our findings. We will also collaborate with the Network Weather Service (NWS) project to provide a new forecasting method based on simulation.
There is an increasing need today for high throughput by data intensive science applications such as the Particle Physics Data Grid (PPDG) or remote backup and archiving, and for data replication. At the same time networks, especially the research and education networks such as ESnet, Abilene and those in Europe and Japan, are becoming increasingly capable of high throughput. However, it is difficult for application users to achieve high performance, since they would need expertise beyond that of most users. This difficulty is since: host systems are optimized for low bandwidths and latencies; configuring the host’s OS and its TCP/IP stack necessary to achieve higher performance varies between operating systems and usually requires systems privileges to accomplish; TCP by design hides the problems; there is a lack of instrumentation and tools to diagnose the performance issues. So far most of the work in understanding how to improve performance has not investigated varying the number of parallel flows. Parallel (i.e. multiple parallel flows) bulk transfer applications are emerging that also take advantage of multiple parallel streams running on top of the TCP stack. Little has been done so far to see how using multiple flows interacts with large windows and how to optimally set the window size and numbers of flows to best achieve high throughput both with today’s common and newer TCP stacks. There has also been little work done to understand and limit the impact of a high throughput bulk data transfer on other applications using all or part of the path traversed by the bulk-data.
We propose to make TCP throughput measurements with multiple window sizes and parallel flows, using tools such as iperf, grid FTP, bbftp, and sfcp for a range of network connected hosts, with round trip times (RTT) varying from a few to hundreds of milliseconds, bandwidths varying from Mbits/s to Gbits/s and using a variety of networks and Internet Service Providers (ISPs). The paths will be chosen as being typical (mainly in terms of bottleneck bandwidth and RTT) of the various paths of interest to the ESnet and High Energy and Nuclear Physics community, for the ability to get iperf servers installed at the site, and for the ability to find administrators/users at the site who are supportive of our efforts. Simultaneous to the bulk throughput measurements we will also make ping measurements of RTT, loss and variability to crudely represent the effect of bulk throughput on other packets at the bottleneck.We will compare these results with those obtained from the ns-2 network simulator. In particular we wish to quantify how well and over what range of the parameters (RTT, bottleneck bandwidth, window size, flows, queue lengths, congestion, TCP implementation, run time), the simulator agrees with observation for various metrics (goodput, RTT, loss and variability). We will also look at the sensitivity of the parameters, and try and decide which ones are most critical and provide guidelines on how to provide reasonable starting estimates for the parameters. Since improvements of factors of 5 to 60 have been observed in bulk throughput by optimizing the maximum window sizes and number of flows, one only requires rough agreement between the simulator and observation for the results of the simulator to be effective in optimizing the TCP stack to optimize bulk throughput. We also intend to modify some bulk throughput applications to adjust the window/system memory buffer sizes at the start of a connection, and later even to dynamically adjust them during the transaction so they are optimal and the system memory buffer sizes are coordinated between transmitter and receiver.
2.1 Simulator extensions
We propose to use the ns-2 simulator to model ESnet network related paths, using realistic values for various simulation parameters (such as RTT, bottleneck bandwidth). We will provide some extensions to the baseline ns-2 simulator, such as arbitrary packet reordering to observe the effect of such anomalies, and the mean and variance of the RTT for each flow to see whether we can estimate loading, and a model for cross-traffic to understands its effects. We will also provide a front end around the simulator to simplify making simulations with a variety of maximum window sizes, parallel flows and other parameters, and to enable the simulation results to be easily compared with observations. To bring the ability to use the simulator to choose bulk throughput parameters to a larger audience we will provide a simpler (point and shoot interface) to the simulator with online help and package the simulator and attendant tools for easy download and installation on a variety of platforms. This will enable non-expert users at other sites to use the simulator to make predictions etc.
We also plan to provide a network enabled simulation server to accept necessary parameters from a client such as the NWS and return goodput and other estimates. The NWS maintains its own database of network performance data which it gathers from a distributed set of sensors. It also includes a forecasting subsystem that applies a suite of performance prediction models to the most recently gathered data, and makes the resulting predictions available in real-time. By comparing the prediction error over time, the NWS adaptively chooses the most accurate model from its suite to use at any given point in time. As part of this work, we will provide an on-line client-server interface to ns-2 so that the NWS can include predictions resulting from simulation in addition to the suite of models it now supports. The study of how simulation-based predictions (based on recently gathered data) can be combined with statistical forecasting techniques to make on-line end-to-end performance predictions is a significant scientific contribution that this proposal will make.
If we run into problems with execution times of the simulations especially as we try to achieve larger simulations, then we will use the "Parallel/Distributed ns" and the "NixVector ns", both developed at Georgia Tech.
2.2 Uses
With the existence of network simulators such as ns-2, it is much simpler to simulate throughput than to actually measure it for many paths (see below for more discussion on this). The simulator can therefore be used to more simply understand the current performance achieved by bulk data throughput applications, especially those using non-optimal default settings, and to aid in optimizing the parameters. It will also be of value to predict performance available between sites on existing paths and existing TCP stacks and how well a proposed upgrade to the path or stack can perform. After an upgrade is put in place, the simulator can be used to help ensure the expected performance is achieved and help identify where further improvement may be needed. The simulator can also be used to look at the impact of the bulk throughput in terms of the increased losses and RTT variability, and guidelines can be provided to limit the impact of the bulk throughput, e.g. by deliberately setting the parameters so that the bulk throughput leaves some fraction of the bandwidth unused, or so that it does not drive the network into unnecessary packet losses and over-runs. Integrating the simulator with NWS via a simulation server, will provide another forecasting model for the NWS, as mention previously. In the reverse direction interfacing to the NWS will enable our project to "pull" information from the NWS so the simulation can build an archive of expected throughputs, end-to-end network probe data etc. which will be useful in different off-line simulation and analysis contexts.
2.3 Advantages & disadvantages
T
he simulator makes it very easy to adjust many of the parameters that affect throughput and quickly identify the effects of each parameter on the throughput. There are available methods to realistically estimate many of the parameters needed for the simulation. For example ping can be used to estimate the RTT, and pipechar or pchar can give estimates of the bottleneck bandwidth for lower speed (<=T3) paths. Since the simulations can be done fairly quickly one can also optimize the parameters to improve the fit between observed and simulated values of various metrics. The simulator does not impact the performance of the real network by making active measurements (apart from measurements made to determine the simulator’s initial parameter estimates). Since bulk throughput measurements can easily utilize over 90% of the bottleneck bandwidth this can be a very important consideration. There is no need to install iperf or another application client and server at local and remote hosts, so one does not need the collaboration of a remote administrator to do the install and configuration or to make available an account and password. Given today’s security concerns, this is a major advantage. There are a relatively small number of parallel high performance bulk throughput applications in major use today in the High Energy and Nuclear Physics community, yet these provide a major fraction of the utilization of many bottlenecks for long periods (days at a time). It is therefore relatively easy to identify these applications and their users /developers and work with them to improve the performance both of the application and to reduce the impact on the bottleneck. The latter is extremely important until the appropriate Quality of Service (QoS) mechanisms are defined, implemented and deployed in the appropriate places so we can use them to limit the impact of high performance bulk throughput applications.
There are some concerns that need to be recognized and addressed. These include the fact that the simulator is less accurate than actual current observations. This is why we will be making real observations on a variety of paths to validate the simulator and understand its range of applicability etc. Estimating the bottleneck bandwidth for high speed paths is tricky even with tools like pipechar, so in some cases more detailed measurements and/or contacts with ISPs and site administrators will be needed to understand a path in some detail. Blocking or rate limiting of ICMP at some remote sites may mean that getting an estimate for the RTT parameter maybe a bit tricky, however, in our experience for 98% of the sites of interest today, this is not a problem, and in most cases can be resolved by contacts with the site administrator or simply pinging the border router. We still need to understand the impacts of other parameters such as queue lengths, congestion, runtime etc. This will be an early goal of this project, though initial estimates show there is quite a lot of leeway in setting these parameters. We will develop more sophisticated ns-2 models to more accurately measure and understand the effects of cross-traffic, i.e. traffic competing at the bottleneck with the simulated flow(s). Fortunately many of the paths used between high-performance sites are "over-engineered" so frequently a single bulk throughput application instance will absorb most of the bottleneck bandwidth. However, as more users take advantage of advanced high throughput applications , the competition for bandwidth is likely to increase and new simulation models to take account of cross-traffic will be needed. We hope to spur the development of better measurement, understanding and modeling of cross-traffic by identifying and studying breakdowns in the current simulation techniques.
3. Anticipated results
The validation of the simulator against bulk throughput observations for a wide range of paths (SLAC has access to a 155Mbps link to ESnet, a 622Mbps link to the Stanford campus and Internet 2, and a 2.5 Gbps link to NTON, in addition to having major collaborators in many countries) will help indicate the range (paths and metrics) and level of detail over which confidence can be placed in the simulator. In turn this information should also provide feedback to the ns-2 developer community to improve ns-2 (George Riley of GA Tech, a key person for the current proposal is an ns-2 developer). With the availability of such a validated tool we will then have another way to assist in optimizing Internet throughput performance especially for parallel FTP. We believe it will provide a useful low-impact middle-ground between simply estimating the bottleneck bandwidth * RTT product (which provides no information on the impact of parallel flows) to give the optimum window size and doing full bulk throughput measurements (which is very network intensive and can be difficult to set up). We also hope to provide realistic estimates of the impact of the bulk-throughput on other users and guidance on how to set the parameters to make this impact acceptable while achieving reasonable bulk throughput. Our close contacts with the BaBar physicists and applications developers (the experiment is based at SLAC), and the PPDG community (the PI is a member of the PPDG collaboration), will enable us to provide them with ready assistance to tune and enhance their applications (bbftp is a BaBar parallel FTP, Andy Hanushevsky one of the key people on this proposal is the architect of sfcp) to take advantage of our experience. We also anticipate tying in the current work with the Web100 developments since we are an alpha level tester and in close contact with the developers. Finally with strong ties with the SLAC LAN engineering group (the PI is head of the group) and access to the ESnet testbed, we believe we will be able to make other detailed measurements to compare with SNMP accessible statistics collected by routers and switches.
Deliverables will include publications comparing simulation with observation for high bulk throughput applications, feedback to the ns-2 community on the validity of the simulator. An extended ns-2 to provide reordering, RTT variation, loss reporting and the development of a model for cross-traffic. A front end to ns-2 to enable users/application developers to tune high throughput applications. This will include code packaged for downloading/installation on the major platforms of interest and documentation on how to use. A web site providing access to results, status reports, the proposal, access to the participants. Consulting for research users/application developers on how to tune/modify their applications, setting expectations etc. Feedback to the Web100 developers on usage and requirements.
4. Project schedule
Budget
Total budget for this project is $380K/year for three years. The breakdown is roughly: University of Tennessee at Kentucky $135K/year, Georgia Tech $75K/year, SLAC $170K/year.