Author: Les Cottrell. Created: November 14, 2000
Set up to advise/provide guiding vision & advocacy for ESSC on research support/activities for ESnet that advance collaborative activities, to identify program requirements & opportunities for testing expanded network capabilities, a forum for coordination of ESnet testbed activities and recommending the deployment of new research support services. The members are Wing (ORNL), Leighton (ESnet), Catlett (ANL), Johnston (LBNL), Winkler (ANL), Bair (PNNL), Livny (Wisconsin), Newman (Caltech), Zcharia (ORNL). Big emphasis today is support for Grids. Focus so far appears to be on test-beds and how to get access for different types of applications. Concerning membership, concerns should be raised to the ESnet steering committee.
Included current plans/activities supporting network research and an overview of opportunities for collaboration with ESnet3 industry partners. DSG (DoE Science Grid) is an implementation of Grid within the DOE unclassified science area. Production services can help anchor the DSG as it moves from pilot to production status. ESnet is a default choice for many of the "core" services such as PKI, collaboration service center, bandwidth reservation.
Possible areas of involvements: applied network research, digital collaboration services, DSG network testbed, Grid directory services, network bandwidth management, X.509 certificate authority and certificate service.
The vision is to make the existing ESnet digital collaboration services into pilot level DSG services. Includes collaboration support services, grid reservation, measurement and statistics, early applications of QoS enabled traffic (e.g. an application based QoS in the "petite" QoS model research reserved bandwidth and MPLS components), add streaming capability (i.e. webcast server both high and low end), webcast servers and H320<>H323 gateways. A second vision is to provide a "persistent" wide area network research testbed. There is an existing testbed incorporating SLAC/Sandia-CA/LBNL...
Another requirement is to allow "power users" to reserve large portions of available bandwidth for fixed periods of time. The needed resources are reserved in concert with other required grid resources and claimed at run time. ESnet is developing a simplified reservation agent and interface, an assumption is that "gonzo" reservations are sparse, i.e. a few authenticated users, few runs, few hours/run. Many complicated issues arise, authentication, authorization, accounting are essential., it is difficult to allocate a distributed resource, it is difficult to allocate/reserve a distributed service. A whole (bureaucratic) infrastructure (similar to reserving super computer resources) is required. A global view by a reservation agent is required, must find a network path that matches requirement, must set aside resources on the path, reserving MPLS path may be an initial answer, the network will provide poorer services to other users at run time and is this unacceptable. I raised the issue of a need for a scavenger QoS where the user can use all available but is placed at the back of queues so has minimal impact on others, this is a bit like the ATM UBR service.
There is a separate IPv6 testbed which will continue. There are routers available at ANL, LBNL, ORNL, SLAC and at STARTAP, have deployed a new IPv6 router at NYC hub. Will incorporate H323 IP based video conferencing into DCS
National Nuclear Security committee for weapons.
Advanced Scientific Computing (SC) advisory committee 1st meeting Oct 31 review. Membership from industry and Labs.
LSN still alive & well. Putting together a plan for post NGI activities (SII) to be reviewed by PITAC Sept 19. May start late May 2001.
DoE SBIR program is set by law at 2.6% of MICS budget, there is White Gouse interest, awarded $3.8M FY00, for FY01 expect $4.2M. Two areas: computational aspects, networking. (see http://wbir.ed.doe.gov/sbir/)
Peer reviews there will be periodic reviews of the FWP, with an emphasis on paper trail for review. There will be proposal peer reviews for any new proposals.
ESnet FY01 (FY00) budget: ATM contract $7M possible could go to $10.5 M ($7M), ESnet ops ($6.5, $5.5M), international $1.2M ($1.1M), video $350K ($300K), Data Science Grid (DSG) $1M ($0K), equipment $900K ($1.5M).
Goals 00>01: complete transition to new provider in CY00, ESnet program review in early 2001, DSG testbed implementation & follow on ( http://www-itg.lbl.gov/NGI/) increased connectivity to universities/Europe & Asia, start transition to SONET, support network research program, MPLS research activities support, QoS services research accomplishments, revised program plan draft by fall meeting.
Summary: look at future of ATM and SONET deployment emphasis needed on increased security; network research / middleware emphasis, increase needed on performance measurements (needs to continue, need to be useful to community).
Objective to develop a secure & scalable hi-perf networks to supoprrt WAN distributed high-end apps, acceleate the adoption of emrtging net tech into production, provide leadership. Element hi-perf ne-eng, secure net aware middlewarew deployment & test emrging tech.
FY01 concentrate on E2E perf, enhance TCP, network measurements (goals Gbbps to science apps, end-to-end bandwidth estimation ..). FY02 experimental deployment, pilot projects, network security, FY03 wavelength brokering, secure QoS, network aware applications. Need hi-perf protocols & services, scalable & smart security, hi-perf middleware & net services, network performance prediction. Impacts on distributed WAN data-intensive computing,
Challenges: high-performance thruput to scientific apps, host congestion, tcp congestion, OS, NIC & I/O bottlenecks, net co-processor. Net performance prediction (core net performance measurement E2E path performance prediction, network weather services), advanced net services, distributed net security.
Hi perfermance net engineeering with goal to significantly increase performance of existing net infrastructure with focus on hi-perf trabsport protocols, traffic measurement and analysis, traffic eng. & analysis, net modeling & analysis. Impact high e2e Gbps performance of scientific applications, highly instrumented net for net aware scientific applications.
Approach recognizes DoE is not alone, also DARPA, NSF, industry. There are a sub-set of problems unique to DoE. Need integrated research program (net, middleware & apps), mission driven research & development, strong partnership between DoE Labs & universities, balance between basic research & development. Peer review process.
Basic research 3-6 years to make significant contributions to the theoretical foundation hi-perf net infrastructure, e.g. protocol extensions, adv. net design algorithms, Evaluation based on publications, RFCs, software distribution.
Deployment 1-2 years goal to accelerate the adoption of emerging net tech into production nets through deploy & testing. Examples: MPLS, net measurement, QoS. Evaluate based on deployment.
Resources include a persistent testbed with operational capacity of 39% of Esnet capacity. A standing committee to provide guidance for a guiding vision and advocacy, coordination, recommendation of testbed services into production network, coordination of activities with other networks.
Net measure & analysis, includes net performance modeling providing analytical techniques (current intractable for complex nets), net research testbed cost prohibitive for terascale nets. So net measure to improve perf of existing nets, isolate congestion points, design efficient traffic engineering strategies, improve net design & ops + a Terascale virtual simulator to be used when analysis & modeling fail with supercomputer simulation capability, predict the performance of future nets, tune design parameters. He suggests a Network measurement center and a terascale simulation environment.
FY2001 solicitation coming out soon. Goal to significantly improve E2E performance, provide a network performance forecasting service, enhance the performance of transport protocol, provide a secure & smart middleware for collaboration. Strategy is basic research $150K-300K, experimental deployment ($1-1.2M). Proposals will be peer reviewed for quality, joint hi-performance & national collaboration. External reviewer will need enough verifiable information in the proposal to be able to recommend it.
Ongoing projects: NSF Web100 project to provide TCP instrumentation (DoE can get involved to deploy test, provide feedback); DoE net instrumentation, hi-performance NIC and net co-processors; DARPA optical nets.
Web100 concentrates on TCP only by providing a TCP MIB, assumes net is a black box, hopes to provide auto-tuning capability, there are no plans for middleware/grid service.
The wider problem for E2E includes I/O, TCP, NIC & interior net. Thomas wants to provide interior network MIB to allow one to see what is going on in the network, so one can do dynamic path selection (by for example locating congestion points). So instrument interior net, gather and analyze data.
FY01 has 3 points: significant e2e performance with net measure & analysis, online bandwidth estimation, net state info services, dynamic route selection; transport protocol enhancements
Net measurement to understand net behavior, efficient traffic eng, facilitate deployment of QoS, efficient net analysis. Need to enhance TCP, since it is not emphasized for hi-performance nets.
Why MPLS (Thomas would like to see a project put forward on this). It provides agile networking & traffic engineering. Eliminates ATM scalability issues (beyond OC48), low cell tax & n-square mesh problem, separates routing from forwarding and provides ultra fast forwarding., allows one to build a bandwidth broker. MPLS is in implemented in the core of UUnet, C&W ... but no QoS, no secure tunnels, static routing and uses RSVP signaling (Cisco & Juniper). Questions are how can DoE benefit from MPLS, when will ISPs offer MPLS with QoS, VPNs etc.
Terascale net simulation environment elements include multi-layer (application, middleware, net), multi-protocol (TCP, IP, SONET, DWDM), middleware (QoS, security, traffic engineering), traffic & topology generation, composable simulation environment scalable end-to-end simulation.
The research needs to be closely coupled closely with ESnet, so it will bring something back into ESnet.
There are concerns that the budget of ~$1.5 M does not cover much today, so need to focus on what would generate benefits to ESnet.
When program gets going will have PI meeting coordinated with ESCC so can provide feedback on what is needed by the community are and what is coming down the pipe from the research community.
Headlines: Qwest transition underway, traffic grows at 100%/year (public peering problems wax & wane), academic traffic seems to be under control, international access much improved, international meeting held in Kyoto, work proceeding in GRID related support, DCS is growing, focus on H.323 & streaming.
Statsitics 649 bytes/packet grown from 552 bytes/packet a year ago, 5-6 years ago used to be about 200bytes.
Contract with Qwest 7 year (3+2+2) $50+ over 7 years, overlaps for up to 2 years with Sprint, moving about 35 sites, hope to complete within year of the signing in Dec99, i.e. goal to be done on CY2000. Based on 5 major hubs. Juniper routers, Cisco ATM switches. Two nets ESnet 3 and older ESnet 2. Interconnect in Chicago at ANL. Latency causing problems for some applications, in particular Babar.
Inter-hub bandwidth will initially be OC3 at reduced costs, will be turned up to OC12 as needed (when all hubs are on-line or sooner). Have a unique ESnet VBR service normal is 0.5 port speed, can burst up to 2.4 hours/day, speed restriction does not apply to UBR traffic but at lower priority than VBR traffic. Idea is to use VBR for production services and UBR for testbed. All hubs to be moved to Qwest "550" network which is a specialiized ATM backbone for hi-perf requirement customers, ESnet is testing now, expect cutover Nov 20. The OC48s may turn out to be 4*OC12 for some time. Sprint OC12 to Chicago NAP will be dropped Dec 1, 2000.
SLAC OC3 access accepted Oct 2000, but most traffic routed thru Chicago since that is where ESnet 2 & 3 meet. This has badly impacted access between SLAC and LBNL or LLNL. They are looking at using existing T3 link to SLAC to provide special connectivity from SLAC to LBNL or LLNL. The link is available but the routing has to be carefully worked out.
LBNL has problems with the city of Berkeley getting the construction go-ahead for installing the OC48 to QWest. QWest is applying for an exemption. They are looking for alternatives, Sprint OC3 to Sunnyvale being priced, uwave to Sunnyvale not possible. FIX-W sharing an OC12 with I2/Abilene. MAE-W OC3 on order, due early Nov, now delayed will change to a T3 will order OC3 to PAIX. PB-NAP contractual issues preclude access, may try a fresh start.
By Nov 10 probably had 40% of traffic now on QWest.
They plan for enhanced peering with Abilene (plan OC12 at SNV, OC3 at CHI exists ...), will drop existing ESnet T1 connections for Caltech, UCLA, UTA, FSU, NYU all have much better alternate connectivity.
The Abilene and CAnet are collaborating with a free international transit network, to allow off-shore networks to connect elsewhere than STAR-TAP, then will have multiple peering points. Govt. nets may only be able to get an all or nothing peering option.
Japan KEK connects now in SNV, JAERI cannot take advantage of so connect in CHI. JAERI & NIFS are satrurated. KEK has 10Mbps ATM via SNV which is not saturated.
CERN has long periods (day or more) of 25Mbps utilization with US, mainly BaBar between IN2P3 & CERN.
NY hub in 60 Hudson has OC3 for Abilene, SURFnet, NORDUnet & JAnet. JAnet has 10Mbps PVC on OC3. DFN, DANTTE & INFN come into Telehouse. Europe-ESnet traffic via NY. Trans-Atlantic ESnet traffic is about 20Mbps.
ESnet role in DoE science grids (DSG). ESnet could provide services for PKI, collaboratory stuff (DCS), petite QoS, streaming and H.323, testbed facility, network bandwidth management to allow power users to reserve large portions of extra bandwidth for fixed periods of time.
Network research looking at bandwidth reservation agent, low bandwidth or petite QoS e.g. for video conferencing & VoIP support, use low bandwidth QoS with DCS as initial application (no authentication, minimal shaping, policing, easily identified application. MPLSsource sleceted network path capability use for large bandwidth QoS path reservation, use to route QoS over selected path, traffic engineering to steer around congestion, eventual replacement of ATM core.
ESnet program plan is expected to be ready for printing early 2001.
Objective is to provide superlative network performance for universities. Want to support advanced networking end to end with 100Mbps cross country being normal and make IPv6, MPLS, multicast all normal. Has OC12 and OC48 pipes, mainly OC48 apart from SEA. There are two end points with OC48 connections (GA Tech and Pacific Northwest). Next 6 months expect to see upgrades as connections. About 50 different connections most OC3, 12 OC12, 2 OC48. Sites connect in thru GigaPoPs. The infrastructure has considerable capacity with examples of 240Mbits/s. End to end performance varies widely, 40MBits/s flows not always predictable users do not know what to expect. Want to raise expectations otherwise the capacity that has been invested in does not produce results. It is important to minimize packet loss since it limits performance, and it is under engineering control.
Solving end-to-end performance requires a distributed measurement infrastructure, team of performance analysis specialists located at sites with cross logon capabilities to do debugging, dissemmination of best practices and with tools on various end-site hosts. They use the Surveyor tools colocated at router nodes on backbones with OC3 to the router. Want to be able to systematically test connectivity hop by hop across the net using iperf and other tools to isolate/identify the network segment at fault, can we make it systematic and (eventually) automated. Want to work with ESnet on this.
I2 multicast has a working group with Kevin Almeroth of UCSB as chair. Currently use PIM-sparse mode, has steady growth. There are new developments including multicast network measurement tools, MIX and single-source multicast.
IPv6 is another area to collaborate in.
QoS working group has Ben Teitelbaum as chair. Need to support jitter or loss sensitive applications over imperfect paths. Also "bottom feeding" applications (i.e. LBE, could also be used for Napster), better than average TCP performance. QBONE premium service.
Demos at SC2000: Stanford music example SC-Abilene-CalREN2-Stanford campus with CD quality audio. Won an award for most captivating and best tuned award. LBL had an application going thru SC-Abilene-Emerge-ESnet-LBL with 2 different implementations of QoS across dissimilar networks, bugs were found and fixed.
International peering is improving, but not a reason for complacency. Nowadays the transatlantic may not be the bottleneck and is becoming more robust. People that are are thinking of new applications that are bandwidth hungry and expectations and demands are increasing.
Miami is connecting to Global Crossing to provide connections to Chile, Brazil and Argentina.
Abilene-ESnet cooperation could also be in areas of exploring use of QWest lambdas, looking at opportunities for high-performance peering at SNV, NY & CHI and to continue QoS trials.
University to Lab paths are important to I2. Measurement technologies needed could be developed together and shared. Source specific multicast and applications is another important area as well as continue collaborating on IPv6 and QoS.
OC48c & OC192c rates with point to point service, provisioning in near real time. Has a "surgenet" testbed. Ties in with existing ESnet hub locations at SNV, CHI, NY & DC. Pricing based on use etc. not finalized. Big topics were security, performance and IPv6. They are interested in setting up a testbed for ESnet.
QWest says they can lay 2-3 miles/day of fiber in city streets. GST is going out of business this is holding up LLNL connection expect completion 1Q2001. LBNL link held up by city of Berkeley.
Security is a big issue especially being pushed from HQ and driven by congressional/public scrutiny. Need to build some rationality in.
The ESSC would like to hear more executive level overview with information on the working group activities, in particular who are the players etc. It might be interesting to have a joint meeting between the ESSC and the ESCC, but such meetings would be long especially since some people need to be at both meetings.
ANL looked at several Labs to see how they architect their firewalls etc. Idea is to have a green network for services visible outside (e.g. web server and < 1% hosts), and a yellow network with 99% hosts/services
LLNL has visitor subnet, have the ability via a web host to open a hole in the firewall dynamically (i.e. open up to a host) for a limited duration. At LLNL use filtering routers. Use Linux based proxy servers to get to yellow network. They scan weekly the green hosts, there is an approval process for locating systems in green network, about 1% hosts are on green net.. BNL uses VLANs to deploy the green network. They use a Cisco PIX. They provide guest computers so visitors do not have to bring in their own computers. INEL blocks netbios, NFS, Xwindows and have an intrusion detection system. ORNL relies on dynamic router filters together with intrusion detection to dynamically configure the router filters in real-time to block things. Thus the majority of the ORNL site is open to the outside world. PPNL requires a security plan for any hosts on its green network. They also have an anonymous FTP mirror where things are FTPd to the green host and then copied instantaneously to the internal host and vice versa, files are then deleted automatically after a couple days. Sandia allows access to specific hosts and ports (e.g. ftp & web) inside the yellow network. VPNs are in common use for the home users to access the yellow hosts.
ANL has not finalized its solution, have purchased a firewall and will have a green and yellow network, but still looking at how to decide what lies on what colour network and the migration plan.
Connectivity included OC-48 ATM, OC-48POS to NTON, OC-48POS to Abilene, 2*OC12POS to vBNS, OC12 ATM to ESnet and 12 Mbps to ISP services over QWest. Cisco provided wireless. 89.9 miles of fiber, 200 exhibitor booth drops. Major problem was the destruction of a Juniper M160 router by a forklift. BW challenge was won by LBL (up to 1.4Gbps), ANL second. Showed MRTG plots of utilization during Visipult/LBNL demo. One surprise was that some of the WAN challenge people were unaware of the need to tune the TCP stack (big windows etc.) to get high performance.
72 ports for audio bridge, occasionally over booked. FTS2000 is dead that eliminates the problem of some video rooms being unable to call others. They will add transcoding support so can support more protocols, especially some very old and new ones. Will add a "test" conference for users to test out their video system in particular to check audio environment for levels and echo. Sites in future will need to run their own H.323 gatekeepers. DCS 2 release date is unknown, beta testing starts soon. Someone will be replacing Joe Metzger who has been doing the DCS development.