ICFA SCIC Dec '01 Meeting, CERN, Geneva

Author: Les Cottrell. Created: December 8

XIWT | IEPM | PingER | Tutorial

Attendees: 

Harvey Newman Caltech), Dean Karlen (Carleton), Richard Hughes Jones (Manchester), Denis Langlin (IN2P3), Manuel Delfino (CERN), Slava Ilyin (MSU), Matthias Kaseman (FNAL), Michael Ernst (DESY)

Introduction

ICFA chairman wants the SCIC to continue, Mathias has asked to be relieved from the chairmanship. The ICFA chair asked for input & suggestions and agreed to Harvey Newman to be the new chair. The official changeover will be Feb 14-15 at the ICFA meeting at SLAC. ICFA wants the SCIC to also give a short interim status report at the SLAC meeting. An important part of the SCIC mandate is to also look to the poorer connected HEP collaborator countries.

Update on US connectivity from CHEP - Harvey

Large amounts (many hundreds of TBytes already (BaBar > 400TB)) of data and data Grids are emerging.  By 2005 major Labs will need 2.5 - 10Gbits/s. The model for HEP computing is based on hierarchical tiers (tier 0 for Lab, tier 1 major region/country computer centers, then to institutions, then to departments and then to users). Upgrades are quickly utilized, typically within a month of installation it gets heavy use.

HEP plan for bandwidth FY2001 thru FY2006 can be found at http://gate.hep.anl.gov/lprice/TAN. CERN 155Mbps to 2*155Mbps this year (2001), 622 Mbps in April 2002. DataTAG 2.5 Gbps research link in Summer 2002, 10Gbps research link in 2003 or 2004. Big surprise in drawing up the TAN review was that stated requirements were bigger than expected. In particular the reality for BaBar distributing data to IN2P3 was much greater and earlier than expected. Big effect was the cost of people and the need to actually really make them effective collaborators. The need was for bandwidth to effectively use these people, while at the same time the cost of bandwidth was dropping and the cost for people was increasing.

In the Hoffman report, network requirements are dominated by the high performance sites, but are well ordered and very conservative (lot of potential for growth). Even for remote interactive sessions there are large requirements, and the stated requirements were restricted (e.g. only 30 interactive active sessions).

To indicate that these numbers for HEP are in line with the rest of the Internet, the growth of US Internet traffic has been 2.8 times per year and now has grown to a factor of 4 per year. By 2010 this corresponds to 10Pbps. One of the possible drivers (according to one Internet expert) for the need for the bandwidth is an Internet  TV channel per person in 2006 in the 1st world. Another confirmation is that the Amsterdam Exchange bandwidth has grown by a factor of 4 in the last year. Such growths in backbone bandwidths require a large investment in local infrastructure also.

In the last year for transatlantic links bandwidth has gone up by factors of 3 to 16. Sylvain Revot has achieved 120Mbps between CERN and Caltech. He applied a modification to the Linux kernel to improve the behavior of the TCP slow start algorithm.

There is an Internet 2 HENP network working group. One issue is a new concept of fairness. Very low loss is required  e.g. to get 1Gbps LA CERN need 7*10^-7 loss rate. Problems at high speed include firewalls, router and other interface costs, TCP, losses, error rates etc.

Abilene has a 2.5 Gbps backbone, and reaches 50 states. The partnership with QWest is extended through 2006. The backbone will be extended to 10Gbps by 2003. Internet 2 has proposed 100 wavelengths for research. Engineering staff are very helpful, open and willing to work with advanced users.  One fiber can hold 160 wavelengths at 10Gbps each.

ESnet has had a plan in place which is slower than other developments (e.g. Internet 2) and in particular does not meet HENP growth needs. They will need to evolve quickly to keep up with Internet 2 and also HENP and other user needs.

Wireless is evolving quickly to higher bandwidth. The Isle of Man is now deploying 3G phones (as of yesterday) and 3G phone will in turn drive IPv6 to meet the needs of  addressing.

We will need working groups within experiments to address networking needs. Need to start applying the Grid prototypes to actual experiments.

 Russian international connectivity - Slava

Science & education in Russia mainly centers in Moscow, St Petersburg and Novosibirsk. The Sorus foundation has had a program to develop education in many regions. In Spring '01 a state program (electronic Rusisa) was started to support telecommunications (not yet funded for coming year). How will telecommunications improve in the Universities? There is a Russian backbone network connecting Moscow (in Moscow 100Mbps and have started a 1Gbps), 100-150Mbps to St Petersburg. Novosibirsk has 30Mbps to Moscow (from RBCOM). In Moscow there is MSU, Dubna, Protvino/Serpurkhov. Moscow  to Dubna is 30Mbps. Next year they plan to upgrade to 155Mbps and potentially in following years to 1Gbps. There are optical fibers between Moscow and Dubna. Summer 2001 there was 6Mbps uwave between Moscow and Protvino. No prospect to go to 1Gbps. Have plans for optical fiber, distance 120km. There are no fibers at the moment for 40km of the route from Moscow to Protvino. There is also a scientific center at Troitsk. Has about 8Mbps hope for 30-40Mbps very soon. In St Petersburg main HEP institute is in Gatchina, 40 km to S of St Petersburg, there is a fiber optic cable, but in private property and is too expensive (monopoly) and at the moment they have only 128kbps from Gatchina to St Petersburg.

International connectivity is provided by many telecommunications centers. There are two directions, one under Ministry of Science & Technology (MoST - includes Labs and MSU) runs RBnet, second Ministry of  Education (MoE - mainly universities) runs RUNnet with center in St Petersburg. MoE is 30-32 Mbps and just upgraded to 155Mbps to Nordunet in Helsinki, MoST has FastNet project. Last 4 years MOST had link supported by some state programs at 16Mbps summer 2001. Had different technical organizations operating. There was a project MIRnet supported by MOST and NSF with a 6Mbps link to STARTap. It peered with ESnet in June 2000 with good connectivity to US Labs. In ummer 2001, MIRnet transformed to FastNet and the budget was merged with the regular MoST budget. There was a tender and the winning operator was Teleglobe with an intermediate PoP in Frankfurt and thence to NY in USA with 45Mbps (should be 155Mbps but technical problems hope to solve in December or January). This is under same technical cover of RBnet. Politically it is part of a Supercomputer center in Moscow. The 155Mbps will be subdivided  with 90Mbps to USA commodity Internet. Within this will live the MIRinet project. Connectivity to USA is in particular for Grid projects, e.g. Khurchatov and NCSA. 20-30Mbps to Europe and CERN. Main problems are getting the budget. Hope the 20-30Mbps will be funded next year and one possible source is INEAS (a foundation to improve European connectivity to FSU).

The connectivity to KEK from BINP Novosibirsk is currently at 128Kbps and onto Moscow at 128kbps. The KEK to BINP will be upgraded to 512kbps with assistance from DoE/SLAC. 

There have, so far, been no discussion/relations with GEANT on how to connect to Russia.

Canada - Dean Karlen

CA*net3  has 8 wavelengths available at 192Mbps.It connects to STARtap, and there is  also a 155Mbps link to GEANT. On Monday December 11 CA*net4 was announced with $100M for 20 years. It will have customer owned wavelengths. Looking at the current connectivity measured by PingER, the connectivity within Canada is acceptable, to ESnet good to acceptable, to .edu very varied, To Germany there has been a big improvement and it is good to acceptable now, UK similar, CERN is excellent, Italy good. Carleton also doing higher frequency measurements and obtain throughput using MSS/(RTT*sqrt(loss)). Also starting netperf measurements. 

Network performance measurements - Les Cottrell

The SLAC led IEPM/PingER project continues to gather data and to be developed by SLAC and FNAL with contributions from DL and UCL.. There are about 32 monitoring sites in 14 countries, monitoring hosts in 72 countries, that between them contain all the countries that have computer sites in the Particle Data Group booklet, over 78% of the world's population and over 99% of the world's Internet connected population. It is very lightweight in impact on the network or requirements for the remotely monitored sites. It provides information on several metrics including round trip times (RTT), jitter, various measures of loss, duplicate and out of order packets. It is particularly valuable for links with limited performance (e.g. to the developing world).  The data goes back almost 6 years. Data and graphical and tabular analyzed summaries with user selection and drill down are available worldwide viewing via the web. The IEPM web site receives over 2000 hits per day. The main community it caters to is HENP but the over 500 sites monitored include many national labs, commercial sites, and the IPV6 test network.

PingER measurements indicate performance for most developed nation links is improving at up to 80% / year, though in some cases (e.g. ESnet to ESnet sites) it is beginning to flatten out. Some of this flattening may be due to the measurement mechanism for high performance links. Links are still poor to bad to developing areas such as Latin. America, Africa, the Middle East and the Indian subcontinent, the Former Soviet Union and Eastern Europe.

 To meet the need to measure and understand high performance links and Grid applications such as file replication SLAC is  setting up a project to measure network and application (various file copy mechanisms) throughputs. This includes about 30 high performance remote sites with high speed links (>= 10Mbits/s) in 8 countries. Some of the early questions we wish to address include: how to optimize the TCP window sizes and number of parallel streams; the duration and frequency of measurements; host dependencies (OS, cpu utilization, bus, disk and interface performance); impact on other users; use of QoS, application steering using network information and/or application self limiting; compare various file copy tools, against one another, against other bandwidth prediction tools and against simpler mechanisms such as PingER; is compression useful and under what circumstances.  So far we have been making measurements of ping, traceroute, iperf, and bbcp since mid October 2001, and are starting to analyze the data.

Early results indicate:

The next steps are to make the measurement taking more robust, further understand the impacts of compression, add and understand bbftp, gridFTP and other bandwidth measurement tools such as pipechar, pathrate etc. Following this we will select a representative minimum subset of tools to make measurements with, improve the reporting/graphing/tabel tools and make the data available via the web. We also hope to tie together the measurements being made in the UK with the SLAC measurements so they appear more integrated  to the user. A further goal will be to make predictions of throughput from the measurements and instrument an application to take advantage of this.

Network connectivity & performance a view from the UK - Richard Hughes-Jones

The SuperJANET4 backbone & links are supplied by WorldCom.. The core is running well, there maybe problems in the Metopolitan Area Networks (MANs) or site networks. Losses were observed from RAL to CERN due to the 155Mbps ATM circuit from DANTE to UKERNA which only supported 133Mbps. Connectivity to Europe is now via GEANT.  Now have 6*155Mbps (POS) from UK to US. About to go to 2.5Gbps. Will split commodity and research. Peers in Hudson St New York. Using 88% of total 930 Mbps measured over 10 minute interval, i.e. close to limit, daily average closer to 50%.  Traffic from US =3*from UK. 

Grid network projects: DataGrid, GridPP (extending DataGrid to BaBar, D0, CDF, ..) in UK. DataTAG: foci network research, Grid interoperability; MB-NG (Managed Bandwidth Next Generation) UK core science funded 2.5Gb development network; optical switching. DataGrid WP7 is network monitoring. Several tools: PingER, RIPE one way times, iperf, UDPmon. rTPL, GridFTP and NWS prediction. Continuous tests for last few months to selected sites: DL, Manchester, RL, UCL, CERN, Lyon, Bologna, SARA, NBI, SLAC. The aims of monitoring for the Grid are to inform the applications via the middleware of the current status of the network input from resource broker and scheduling, to identify fault conditions in the operation of the Grid, to understand the instantaneous, day-to-day and month to month behavior of the network and to provide advice on configuration. A report has been written on using LDAP. UDPmon measuring throughput with UDP. Send a burst of UDP frames spaced at regular intervals. Gives one way jitter

Added LDAP interface to PingER for users to access data. Also monitor with Netmon, UDPMon, iperf and RIPE. Seeing nice correlations between PingER and UDPmon, and Iperf with UDPmon. So UDPmon might be able to give a rough estimate. NWS predictor tracks the average. Iperf . UCL to SARA now gives 90Mbps with a 100Mbps NIC on local host. 

How do we monitor at Gbps, need tcpdump on steroids. 

DataTAG approved: CERN/INFN/INRIA/UvA/PPARC are the partners. Need interoperability between Grids in Europe & the US (PPDG, GriPhyn, iVDGL). 

MB-NG will use MPLS with managed bandwidth and QoS provision.

GridPP and DataGrid have been working on packaging Globus versus as an .rpm kit. This makes installation simpler, and is available from DataGrid WP6 web.

SuperJanet4 stable working well. Access to campus becoming important.

David Williams

The EU commission is interested in trying to improve global connectivity to less well connected regions of the world on a political basis. Four regions: Balkan (S.E. Europe), newly independent states (NIS) of FSU, N. Africa, S. America. There is a project (UMednet signed and is now happening) to connect many of these countries into GEANT (not into 10Gbps core but in periphery). GEANT is now 31 countries. There is a political agreement to improve connectivity to S. America (signed and happened). Balkans is more complex. Many schemes some by countries, some by NATO, UNESCO etc.  States that will access EU (Accession States) will have special arrangements. The SCIC should pass information on opportunities and support relevant proposals. A pre-requisite is that universities have good connectivity.

EU Sixth Framework program (Fifth running out in 2001, gave impetus for DataGrid, DataTAG & TEN-155), would like to have funds flowing from it in 12 months. It may get approved in coming weeks or else in June. Still discussions on focii and funding. It will last for 5 years.

French connection - Denis Langlin

RENATER 2 connected to GEANT at 2.5 Gbps. Tender for RENATER 3 in the pipe. OC3 & OC12 will go to 2.5Gbps. Link to STARTAP to improve to 622Mbps next year. Budget is 20M Eu per year for RENATER. RENATER is not a particle physics offspring, it is set up in parallel to particle physics nets. Monitoring stats not always available.

HEP connectivity. National 18 French Labs connected by private link to RENATER POP (not via University) and thence to CCIN2P3. Each Lab has reserved bandwidth between 2 & 34Mbps. Most intrnational connectivity private and 80% due to BaBar. International links: 34Mbps reserved bandwidth within RENATER to Chicago used for BaBar uDSTs and well saturated using  bbftp. Rest of BaBar, D0, CERN etc. is going through CERN and thus to GEANT and uSLIC to US. Using 80% of line. CERN Lyon link 155Mbps going to 622 Mbps in June/July. Transmitting 1TByte/day SLAC to IN2P3. Current limit is 100Mbps through CERN. RENATER goes to 100Mbps in January 2001. Very sensitive to local bottlenecks both lines appear to be rate limited at moment, not understood why.  CCIN2P3 has 10Gbps to Paris and then via GEANT at 2.5Gbps to CERN. Unclear whether will keep private line to CERN. Particle physics is driving research networks at the moment. 

Politicians do not like parallel independent research networks. In France there is a research network VTHTP setup but partcile physics not allowed to enter research only to use (i.e. HEP are not regarded as network researchers). Same for DataTAG, INRIA can enter but not IN2P3.Only way for IN2P3 to get involved is via WP7 but French leadership of WP7 is given to INRIA.

CERN - Manuel Delfino

There is approval of the LHC computing datagrid. DataTAG is 50% to act as network research, 50% for interoperability between grid projects. Will implement links by using wavelengths. NIKHEF set up wavelength link to CERN. CERN US link often gets attacked. There is a request to close triangle by a wavelength link from Amsterdam to CERN. Within Switzerland there will be better connection between ETH schools, probably  by using wavelengths. It will probably connect to France & Germany. CERN has set up an OpenLab (see http://cern.ch/openlab/) project with Cabletron, KPN & Intel. The CERN openlab is a collaboration between CERN and industrial partners to develop data-intensive Grid technologies to be used by the worldwide community of scientists working at the next-generation Large Hadron Collider. They hope other vendors will join.

Germany and FSU - Michael Ernst

Transatlantic connectivity is 2*622 Mbps ending in NYC and Hamburg and Frankfurt. US-Germany peaking at 120TB/months, the traffic is becoming more balanced from/to US with from US still bigger. DANTE plan in 2002 for direct links to Abilene/CANARIE. UCAID will add another 2*2.5Gbps, and proposing for 2002 a joint project GTRN (Global Terabit Research Network). Expect to upgrade 1Q2002 2*622Mbps to 2*2.5Gbps, contract between DFN with 2 providers Global Crossing and KPN QWest (2.5Gbps each), tendering in collaboration with DANTE, with same termination points. ESnet peering is at 34Mbps. 

GEANT 9 trunks at 10 Gbps and 11 at 2.5Gbps. Now 31 countries (just added 6 countries). Guaranteed QoS (see http://www.dante.net.sequin/) There are big differences in the lowest and highest bid offerings (e.g. today average is 5000 minimum is 40). 2.5/10Gbps are SDH single wavelength.  Only 10% more for 10Gbps vs 2.5 Gbps. Traffic to Europe from DFN is about 40TByte/month (1/3 the TransAtlantic volume). DESY has 155Mbps to DFN. Bottlenecks are at universities not in DFN, so there are no requests for managed bandwidth.

FSU connections via the DESY satellite. Operational satellite Yerevan 192/198kbps, Minsk 32/32. Almaty 512/128, Baikal 38/38, ... Now there is a SILK project with NATO backing. Was initially Caucasus and Central Asia (8 countries). Bandwidths from NATO currently 64-512kbps. Want to go up by a factor of 10. Not affordable by current model. There is no affordable fiber in Caucasus or Central Asia. So propose a VSAT technology upgrade. Can get Mbps for $25K/year. Funded by NATO in range $2.5M, co funded by national governments.  Will be staged and bandwidth will increase from 1 to 5 to 11-25Mbps. New funding will allow further increases up to 50Mbps. Satellite provider is a French/Turkish collaboration (EurasiaSat). DESY will provide technical management (earth station at DESY). Project is ready to start, technical and organizational parts are now in place. Will start early 2002.

Japan & KEK - Yukio Karita

http://www-nwg.kek.jp/~karita/icfascic-dec01.ppt

SuperSINET will start service in January 2002. Intended to be fully photonic, 10 lambdas (wavelengths) fromKEK. One for 10 Gbps IP service, 7 for direct GbE (or 10GbE) to university HEP. HEP VPN within SuperSINET with MPLS for HEPnet-J. NII US/Europe link SuperSINET plan for 2003 with 2 lambdas to W. Coast (lamba is 2.5Gbps) one for IP backbone, one for direct GbE to be used for KEK-CERN testbed (Canarie can bridge SuperSINET lamda and DataTAG lambdas). Minor upgrade in Jan 2002, 5 OC3 POS for default IP traffic. Jan 2002 KEK-ESnet will increase from 10->20Mbps. 

Links within Asia. Kek-Taiwan (Academica Sinica) 1.5Mbps Frame-Relay link to be merged with Academica Sinica APAN 45Mps. KEK-BINP 128kbps upgrade 512kbps, waiting for US/DoE/SLAC support for Russian half circuit. KEK-IHEP (Bj) 128kbps link was to be merged with CAS but may continue to exist. APAN has many inter-regional lines, mainly centered an Japan. There is an ACFA network project, will use APAN for many connections.

Harvey Newman information from Olivier Martin

US-CERN link consortium (USLIC) has been reliable apart from one 1 day outage during a holiday, and phone link to login to router bill was unpaid so the link was disconnected. The Cisco PIX firewall has a problem with throughput so looking at bypass. CERN has Gbps to SWITCH, to GEANT has 2.5Gbps shared (CERN has 1.25Gbps and expected that access bandwidth to GEANT will double every 12-18 months), 2*155Mbps to US. CIXP continues to grow. CERN has 200 fibers coming in.  Did market survey with a call for tender. Sent to 29 providers, 19 replied., 17 invited, 13 responses, short list was 4 providers, final decision Dec 12. Will be 622Mbps production, 2.5 Gbps test for DataTAG. DataTAG 3.9M Eu partners are INRIA, INFN, CERN, U of Amsterdam, PPARC with US (NSF) partners. 

Discussions etc

Need to focus on  what to present to ICFA. Will have 30 minutes. for report each presenter should make a short (up to 2 page) summary of their area to be submitted before Xmas. Harvey will put together into executive summary and get feedback. Talks will be made available on the web.  Mathias will put talks on the web.  So what is the message we want to present to ICFA.

The next meeting will be after the ICFA meeting (at SLAC in February). Would like to to do soon after ICFA. First week of March is favorable (Saturday 9th March at CERN). 


[ Feedback ]