I visited Daresbury Laboratory (DL) and met with Paul Kummer and Robin Tasker. Robin is an active member of the European Data Grid (EDG) Working Party 7 (WP7). WP7 have taken the SLAC developed PingER and extended it to incorporate throughput measurement tools iperf and udpMon as a prototype. They have made this into a dsitributable package (a tarball) and it is now in use at about a dozen EDG sites). I showed them the new SLAC led IEPM-BW measurement infrastructure and results. We need to maintain contact between the two projects since there is a common base and a lot of complementarity. We agreed it is still premature to join the two projects since both are in prototype stages to find out what works and is useful. We will keep each other informed of developments and will provide links from each web site to the other's site. We looked at the throughput achievable from SLAC to DL which appeared to be limited at about 40Mbits/s. I was able to identify tyhe limit as being due to the TCP maximum window sizes being too small. Robin increased the windows and we were able to about double the achievable throughput. We are now limited by the Fast Ethernet interface on the host at Daresbury, so Robin agreed to upgrade to a Gigabit Ethernet connection. We also discussed a QoS project being proposed in the UK and when ready (late this year) will extend it from the UK to SLAC.
I have a much better understanding of the European grid projects, high speed networking initiatives and how the monitoring is coming along. This will greatly facilitate coordinating the DoE funded measurement activities with those in Europe. This is critical due to the international nature of HENP and grid activities.
At Daresbury I contacted Robin Tasker and Paul Kummer of the Daresbury Lab computing/networking group.
The first meeting was at the Daresbury Laboratory near Liverpool, England.
The meeting with Richard Hughes Jones, was at CERN, Geneva Switzerland
The International Committee on Future Accelerators (ICFA) Standing Committee on Inter-regional Connectivity (SCIC) was at CERN, Geneva Switzerland
Paul Kummer, Robin Tasker
I visited Daresbury Lab on March 7 '02, and met with Robin Tasker and Paul Kummer.
SuperJanet 4 (SJ4) core (Glasgow, Edinburgh, Leeds, Manchester/Warrington, London, Reading, Portsmouth, Bristol) production net running at 2.5Gbit/s. It will be upgraded to 10Gbits/s in April 2002 (see www.superjanet4.net). Development network is Reading, Leeds, Warrington/Manchester, London and is 2.5Gbits. Transatlantic just changed: commodity Internet access is 2.5Gbps via a London provider. Also bought 2.5Gbps circuit to from London to NY, and just concluded peering with 622Mbits to ESnet and to Abilene. GEANT now running and SJ4 has peering to GEANT at 2.5Gbps in London. GEANT backbone si 10Gbps. UKERNA are beginning discussions as to whether to use GEANT for production service to US and free up bought 2.5 Gbps for development purposes. E.g. if MPLS project goes ahead then might use this freed up service. Daresbury has 155Mbps to Manchester, may go to 622Mbps (RL has 622 MBps going to 2.5Gbps this summer via (ThamesValley Network (TVN)). Internal network at RAL is using Nortel. DL also looking to 1 Gbps to Warrington. BT offer 1GE 25 km, $60K install and $50K/year. DL also looking at getting dark fiber but requires digging trenches. Dark fiber to SONET was factor 4 in cost increase. Internal network at DL is using Extreme.
DL use wireless, only a visitor subnet. Visitor subnet has dynamic DHCP and implemented as a VLAN. It will be segregated from the rest of the site. The site has a firewall with state for FTP running on a Unix/Linux type 1.2GHz Intel, supports 3 domains. Keeps up OK with 155MBps. RAL has similar on their 622Mbps link, handles current traffic of over 100Mbps. There is also a bastion host for ssh logon to a subnet inside DL. Run encryption on IP tunnel between RL and DL on BorderGuard boxes which constrict the flow, e.g. H.323 does not work well between RL & DL, or RTT goes up from 10ms to 30-40 ms under load.
DL host was set to 64KB maximum windows, so we increased to standard large windows and re-measured. Doing this increased the throuhput achievable from 40Mbits/s to over 70Mbits/s Robin will be looking to get a GE NIC for RTLIN1. RL is now running at about 100Mbps as seen from SLAC.
Robin downloaded and built the NWS distribution (NWS-2.05). There is a 'c' executable called add_forecasts (see Doc/user_guide.html) that one gives the data to, and it forecasts the next measurement. Warren has a copy of Robin's code.
WP7 (Working Package 7 is part of European Data Grid (EDG) was started by HEP and is now an EU funded project). Warren Matthews of SLAC is on the mailing list. They have developed a package for making measurements with PingER, udpMon, iperf saving the results in standard output file formats used by each application. Then can use the standard web tools (e.g. the PingER ping_data_plot.pl tool from SLAC) to visualize, or use HTTP GET to extract the data and present the derived metric to an LDAP server (Alex Martin of QMW) for publication. WP7 have tarball/RPM to allow installation of their PingER/Iperf/UDPmon measurement/recording tools plus analysis/reporting tools (see htpp://ccwp7.in2p3.fr/)
EDG is looking at r-GMA (i.e. a relational version of the GGF Grid Measurement Architecture). Steve Fisher of RL is driving this along, and Peter Clark has a post-doc (Paul Mealor who is working on this).
The critical thing will be to get a standard format/schema so there is a common interface for the middleware, underneath how one makes the measurements are made should not matter.
This project started in January 2002. It is at the stage where the different WPs are now working out what needs to be done and who needs to do it (i.e. developing their program of work). Robin is WP2 leader (high performance networking), to look at enhanced better TCP or other transport, QoS and advanced reservation. Monitoring is being done in WP3 by the Dutch (Cees DeLaat is one of the leaders).
An extension out of DataTAG (2.5 Gbps link from CERN to Chicago) is to add in NetherLIGHT from Amsterdam to StarLIGHT (will need an optical link between CERN and Amsterdam), see http://datatag.web.cern.ch/datatag/.
DT WP2 wants to look at QBSS since it appears to be an obvious candidate. They will try it across the UK testbed (between Manchester, Rutherford and UCL) and the DataTag development network. After running testbed the idea is to extend the test network to true end to end performance. E.g. at the end of the UK project we might want to do tests between SLAC and DL, using class C networks at each with the appropriate announcements. Timescales on UK project are to have access to UKERNA development network in summer '02 and so will run a set of tests then, including, maybe a year off to doing end-to-end to SLAC.
The grid network team has been setup in the UK. David Hutchison of Lancaster is the chair with Peter Clarke the deputy. There are 9 or 10 eScience centers set up in the UK (including RL, DL). One of the first things is to set up monitoring between the eS centers and publish information using information servers such as LDAP. AS a first demonstration will put out WP7 stuff. They have funding for a person for 2 years to work on at DL.
Richard met Ruth Pordes of FNAL/PPDG in Paris earlier in the week. He showed her the EDG WP7 PingER/UDPmon/IperfER package and she appeared impressed and pleased that there is a strong collaboration between the SLAC IEPM group and WP7.
Brian Tierney has stepped down from co-chairing the GGF working group defining metrics, and Richard Hughes-JOnes is now co-chair with Jenny Schopf of ANL.
We discussed how to publish the data. WP7 uses scripts to read the data and put them into an LDAP accessible format for all their data types. It probably makes sense for us to look at the LDAP code for PingER and see about integrating it into the standard PingER package. It would probably be interesting to put together a script that made the data available in XML format. The EDG also have an RPM installation package for Globus.
WP7 is also using the NWS predictor which is easily separable and callable from an application that provides the data. SLAC will look at using this for making predictions from IEPM-BW data.
Richard will provide IEPM-BW with a host to monitor at Manchester University.
UDPmon uses the NIC card clock to increases the accuracy of timing packets. By default it sends 300 * 400 Bytes packets (i.e. ~ 33Mbits/s). Richard found 0 packets was unreliable, 200, 300 or 400 packets appears to not run into packet loss, 1000 does exhibit loss. It looks at the arrival time of packets
The WP7 site has a CVS repository of the modified PinGER/UDPmon and iperfER packages. Richard has a modified version of the PingER data query code that allows selection of the column to use.
There are some demos planned for the igrid2002. There will be 6 high performance PCs each with 4 * 66MHz x 64 bit PCI busses using a superMicro motherboard. They have tried Intel and SysKonnect NICs. I believe Olivier Martin has procured the machines, and 2 will be sent to Richard in the UK. This is part of the DataTag work.
Besides the EDG WP7, there are 2 DataTag WPs of interest, WP2 for high performance throughput and QoS (headed by Robin Tasker), WP4 for harmonization and inter-working between EDG and DataTag, WP3 for applications. DataTag wha links between STARlight and Amsterdam, Amsterdam and the UK, and this summer is hoping for 3.5Gbps from Amsterdam to CERN. They will be using Cisco 15454 multiplexers.
We agreed to try and follow up this meeting with further visits so we can ensure common progress between IEPM-WG and WP7. I will visit Manchester in April for 2 days (on the way to Romania), also maybe visit Peter Clarke at UCL around the same time, and Richard will visit SLAC before the next ICFA/SCIC meeting.
Action items include:
RHJ get host for IEPM-BW to monitor at RAL
SLAC get the NWS prediction code and try it out.
SLAC look at RHJs query code modification to add column selection and incorporate into PingER if appropriate.
SLAC install and try out UDPmon in IEMP-BW, I have a lot of the code on my laptop.
SLAC look at LDAP code from WP7
Les investigate getting a copy of the SC2001 real time PingER/iperf demo for iGrid2002.
We will try and make progress on these action items before the next meeting in Manchester.
For talks see http://l3www.cern.ch/~newman/ICFASCIC/
Harvey welcomed all, and reviewed the agenda. then he summarized his presentation to the ICFA committee at SLAC in February 15, '02. Harvey is now the chairman of the SCIC. ICFA requested Harvey, as chair of the SCIC, to come back in July (in Amsterdam) and again in October to update the SCIC. At this meeting Harvey wants to review the membership to ensure we have the needed global coverage. The critical items for the SCIC are tracking national & international infrastructures, look at remote regions (special cases, generic problems), encourage R&D. There was an important Grid workshop in Brazil. Brazil wants to get more active in Grid collaborations. AMPATH will connect S. America, the bankruptcy of Global Crossing needs to be followed.
Manuel pointed out that the Lab directors for the next report, need to have a 1 page (front) summary of what they need to do.
UK used to have 6*155Mbps to US. New have 2.5Gbps between NY and London with 622 to both ESnet & Abilene. The commodity peering is done in London rather than NY. The change in peering reduced from 90% full to about 20% full, this in both traffic directions. PingER does not detect major changes to FNAL and SLAC. To Europe have 2.5Gbps peering to GEANT, now have excellent connectivity to CERN & DESY.
Russia has 2 international links one to Nordunet via St Petersburg, Helsinki to STARnet soon to be 155Mbps (currently 34Mbits/s). The Russian Universities have RUnet provide this link. RUnet is supported by the ministry of education. ITEP connects to DESY & CERN via RUNnet. There is a Fastnet link from Moscow to Frankfurt at 155Mbps run by Teleglobe. Currently this is missing some technical work in Frankfurt to connect to GEANT. DFN has signed agreement to peer in Frankfurt. DANTE does not have the mmission to connect RBnet to GEANT so they peer throu DFN. There is also a satellite link from Rusia via DESY. BINP/NSK has a 128kbps landline via KEK to US that will be upgraded to 512kbps in March '02. There is also a NSK fiber optic link to Moscos at 30Mbits/s run by Russian Backbone network (RBnet) which carries both commodity and scientific projects. Will start 10Mbit to CERN in April. There are lack of budget problems, they are working on them.
There are big financial difficulties. There is effectively a 53MSF shortfall, and cannot cut LHC. Hope to unfreeze 53MSF in March, but may delay until June which will hurt longer. So being very conservative. The request from SURFnet to bring in a lambda to close the DataTag triangle. The LHC will be started in April 2007. Under pressure to postpone updates including the US link, but get behind technology. From the 53MSF will give 33MSF to LHC so still will be short of money in DD division. They are optimistic about accepting the lambda link from SURFnet in the near future, will probably use it initially simply by converting photon to electrons. CERN is still fully behind GEANT. GEANT is mainly a production service, so accepting the SURFnet link is for experimental testing etc., and will not impact GEANT interest for the general production network.
CERN has 2*2.5 Gbit links, 1 for SWITCH, 1 for CERN. CERN is concerned with the way DANTE is not collaborating with DataGrid and DataTag. Will upgrade (April 1st 2002) CHI link from 2*155Mbps to 622 Mbps to be terminated in Starlight. Harvey wants end to end high speed connectivity between a capable server at CERN, Starlight and FNAL.
90% of current international needs are to SLAC. Used to have 2 links via Renater, and via CERN. Renater used to be 30 Mbps, CERN link was 155Mbps. Renater link got problems could only get 20Mbps, then dropped further to 5Mbps. Then it was fixed, but nobody know why. Renater now increased to 155Mbps to Startap with IN2P3 getting 100Mbps but 50Mbps limit by ESnet. ESnet will upgrade link to Renater from 155Mbps to 622Mbps, then can increase limit from 50 Mbps. CERN backup has been very valuable especially for recent workshop. Lyon CERN link to go to 622Mbps in Jul '02. Goal is to have 622Mbps from IN2P3 to SLAC for BaBar by end of year.
BaBar computing is driving networking at SLAC. Continued growth for next 4+ years with a rapid transition from SLAC based to distributed grid computing so less hardware growth at SLAC, so increased grid effort and dependency. BaBar is using tiered, regional centers are tier-A and offer resources at the disposal of all Babar, each provides 10% of total BaBar computing analysis/needs. CCIN2P3 is already in use with in 2001: 130 cpus, 25TB, 155Mbits/s, 2002: 380 cpus, 45TB, 622Mbit/s. They also have a PB mass storage system behind the disks. The other tier As are INFN/Padova in 2002 13TB, 390 (30Spec95 Units) (in 2005 (27, 810), RAL with 20TB and 200 (Spec95 Units) and in 2005 (40, 500). Most of SLAC externela traffic growing (100-160 from Aug 2001 to Feb 2002) now 120-140Mbits/s. Renater link problems started in Dec 2001, fixed in March, now upgraded, ESnet limiting to 50Mbits/s. BaBar & SLAC submitted to ESnet over a year ago a request to upgrade the SLAC link with requests to upgrade peering links. But it appears to come as a surprise to ESnet and they do not have the capacity needed to Renater. ESnet has money problems has had a $16M budget for a long time flat. Yet there is a big increase in utilization of networking by not just HENP. MICS runs NERSC, ESnet and some science research. We (HENP) have to enunciate our concerns about ESnet lack of backbone capacity. We will probably have to provide more money, at some stage it might become more attractive to use Internet 2. Richard is now on the ESnet steering committee. ESnet will connect to STARlight at 1GE. CERN is perplexed with the situation in the US. CERN collaborates, and funds links across the Atlantic but then run into problems within the US. Internet 2 is working well, and the ESnet backbone is an embarrassment. ESnet has a yearly budget of $16M.
Expect to upgrade from 155Mbps to 622Mbps in 2 months. They also have another 155Mbps to NREN/I2. Looking at local arrangements to get good connectivity to STARlight. Looking at railway company, and electricity company (COMnet) to get dark fiber. COMnet has fiber on FNAL site and at STARlight. No schedule yet, could be over next half year. FNAL is part of iWire so that might be another option when it becomes reality.
SuperSINET started Jan 4 2002, NII US/EU lines upgraded Jan 4, KEK-Taiwan T1 FR converted to APAN STM1, KEK to NSK 26 Mar upgrade from 128kbps to 512kbps. SUPERsinet has 10GB backbone connecting Nagoya, Tokyo, Osaka. Belle is starting to use SuperSINET. International connectivity from NII 300Mbits from KEK and GEANT via London/DANTE, 20 Mbits/s to ESnet.
GARR B Mar 2002 4 backbone sites completely meshed by 2x155Mbps trunk lines, 155Mbps to PoPs. GARR G pilot backbone has 2.5Gbps to GEANT, core Milan, Bologna & Bologna with connection to other sites at 622 and 155Mbps. Production traffic tests start Mar-Apr 2002. Questions: only appear to have 155Mbps dotted line to Padova which is hardly sufficient for the BaBar tier A site, so it needs to be production; is it?
The transatlantic links DFN are planning to add 2*2.5Gbps in 1Q02 and will expect delivery of link from KPN/QWEST in 4-6 weeks from now. Problems are expected with the 2nd line which was ordered from Global Crossing. Added TA bandwidth to be provided in GTRN context between Abilene/CANARIE & GEANT, DANTE is going to make probably 2*2.5Gbps available within 4-6 weeks. The SILK project funded by NATO to provide/enhance connectivity to 8 nations in Central Asia (Armenia, Georgia, Kyrgistan, Uzbekistan, E. Azerbaijan, Kazakhstan, Tajikistan & Turkmenistan) is on track. Funds for 2001/2002 are available and the majority is being spent on equipment (Cisco routers, satellite dishes, transceivers etc.) Expect to have a hub at DESY and basically all remote stations will be operational by May '02.
This was in draft form and not to be circulated. David wants to check and revise, but wanted feedback. See www.rdg.ac.uk/~ems97ps/seeurope.html. Terena has a report on networking in Europe available from www.terena.nl/compendium current as of mid June 2001. Slides available via http://l3www.cern.ch/~newman/ICFASCIC/ with all the other talks.
Suggestions/questions: could use passive measurements to force non registered big flows to have QBSS; could set one stream to have best effort and set the remaining streams to have QBSS; BaBar Objectivity data compression ratio to tape is close to 2. The poor Russian performance seen from the Russian monitoring sites maybe since the links being monitored are using the commercial Internet. The Balkans is now known by the more politically correct term of S.E. Europe.
Some suggested topics for WGs include: Monitoring, requirements, advanced network technologies, remote regions, world network status, proposal, recommendations. We need to determine which WGs we need and who will volunteer to lead. There are too many WGs for them to be staffed from within the SCIC. Could demand a 2 page written report from members for their regions and only talk about the highlights..
Need monitoring to provide quantitative/technical view of what is happening so that we can come up with and justify our recommendations. Need to identify inter-regional connectivity requirements (intra seem to be OK and handled by others). Also need to get better representation from regions, and some regions are very diverse (e.g. how does a Brazilian represent Argentina), why are there multiple representatives for W. Europe or USA.
There was a discussion on the need for the requirements working group. ICFA wants to get requirements merged for all the experiments. To some extent this is covered by HEP-CCC. Will need others on the committee besides the SCIC members. For the advanced network technologies, do we need a person, a sub-committee, how do we get updated. It was proposed that this be a theme for each meeting with one or two talks per meeting.
It was proposed that the SCIC focuses on a different topical subject each year to present as its major focus in the yearly ICFA report. We need to put the ICFA folks into a space where they can do something such as lobby a government, ISP, funding agency, part of this is to identify the areas ICFA should be worried about and where they can do something. Three possible focus areas are: end-to-end connectivity; the digital divide (focus on problem area/regions and understand well enough to request ICFA to step in); key requirements (mainly at the high end that would not be naturally satisfied without some intervention). In addition there will be a focus on monitoring & world network status. Also will need a lead person to have responsibility to schedule talks on advanced technologies. The next step is to find lead persons to head up each of these areas. The person in charge will gather outside the committee people to assist. The lead person will also write the section for the ICFA report. The leads will be:
Lead person will define activity, try to remove significant duplications, sign up rest of committee. Will also circulate to experiment spokes-persons to get any needed additional representation. The report will lead from Monitoring and advanced technologies to the next three items. This will be concluded with examples of pointed issues and also generic issues, that we hope ICFA can do something about, even better if we can give some idea of how the issue might be addressed. Want to have a substantive report for the meeting with the Lab directors in about 10 months time.
Continuation of 5 year old STARtap but avoids dependence on ATM will use production at 1GE, and moving into neutral location (not Ameritech). Getting some funding from NSF. Jointly managed & engineered by iCAIR (NWU), ANL, and UIC. It is a GE exchange point using Catalyst 6509 with policy free 802.1q VLANs. Also have Juniper routers for connection to STARtap. First international network connection was SURFnet from Amsterdam via Cisco 12000 GSR & 15454. ESnet is expected to connect via dark fiber at GE to Cisco 6509 at Starlight from the QWEST POP in March/April.
CAnet3 terminatesa July 31, 2002 CAnet4 RFI issued August 2001, funding announced Dec 2001 - $110m one time grant for 5 years, selection of carriers and equioment to be announced shortly. Using point to point optrical links reduces cost of routers since jsu use STS cross-connect equipment. CN4 plan to turn up Luly 4 '02. Costs came in under expectations so have reserved funding. May add more GPoPs. Major objectives: production network support to support basic research; want to move circuit switching under control of user (i.e. university schools businesses manage & control own dark fiber. Want to empower customers to manage & control their own net resulting in new applications and services. Lambda grids are being deployed around the world so projects has a pool of wavelengths, want to enable researcher to concatenate pools together. HENP is an ideal environment to start with. Current view is a centrally managed cloud of optical network, any changes have to be made by requests to the carrier. With CA4 goal is to enable experimenter at TRIUMF/UBC to get a dedicated channel across campus to IXP in Vancouver, then connect this wavelength all the way to CERN via Chicago.
See http://obgp.canet3.net/ for the proposed way of letting users set up circuits.
Working for UCAID/Internet 2/Abilene to provide research communities have the best connectivity available and support innovative applications and advanced services (e.g. Multicast, IPv6, QoS, measurement, security, jumbo frames) not possible over the commercial Internet through a high performance IP common bearer service. It is not directly federally funded. It is a partnership with QWEST provided SONET & soom DWDM, Nortel (SONET kit to QWEST), Cisco routers, Indiana U (NOC), ITECs in N. Carolina & Ohio (test & evaluation).
IP over SONET (OC48c) backbone with 53 direct connections, have 1GE trial (MREN), 3 OC48c, 23 will connect voa OC12c by 1Q02, number of ATM connections decreasing , 210 participants (research universities and labs) all 50 states, DC & Puerto Rico, 15 regional GPoPs support 70% of participants, 39 sponsored participants, 21 state education networks. In expanding access were careful to ensure did not hurt the innovative applications and advanced services. International peering at LA, SNV, CHI, SEA, HOU, ATL, NYC. There is transit over Abilene (e.g. Australia to Belgium). Running at 30% utilization today. Support 9KBytes on backbone, so now an end issue. SC99: 622Mbps; 00: 2.5Gbps; 01: 5Gbps; 02: 10Gbps (planned). Have tested packetized raw HDTV at 1,5Gbps at SC01 SEA.DEN via L3 on OC-48c SONET.
DARPA PIs SEA>DC area 1/6/02 18 hrs single streams raw HD/IP UDP jumber (4.55KB) packets, no losses, 15 re-sequencing episodes, i.e. loss < 8x10^-10m(90% confidence level), reordering 5x10^-9. This is OK for TCP.
Netflow passive study of bulk transfer for large flows >10MB.
Abilene now has commitment to Oct 2006. Backbone will go to 10GE 2002-2003. New backbone will be native IPv6 run natively concurrent with IPv4. Abilene circuits will not be protected, unlike SONET. Addition of new measurement capabilities, enhanced active probing (surveyor), latency, jitter, loss, TCP throughput, add passive measurement taps, support for computer science research ("Abilene Observatories"), provide data to researchers.
Pacific (=> National) Light Rail including CENIC/ONI & P/NW, ANL/TeraGrid and UCAID is an emerging & expanding collaboration to develop a persistent advanced optical network infrastructure capability to serve the diverse needs of the US higher education & research communities.
The mission is to ensure infrastructure is in place to meet the needs of the HENP program, with broad applicability to other fields. It is very synergistic for the I2 E2E group. Have 88 members on email list, membership is a mix of US scientists, want to expand more internationally. Evolving integrated applications (i.e. Grids) rely on networks. Have 9 goals which are part of the charter. Goal 1 support the development and deployment of of high throughput tools, network measurements (Iosef Legrand/Les Cottrell), share information or preovide advice on the configuration of routers, switches PCs and network interfaces & testing & problem resolution to achieve high performance, work with staffs of international, national, regional and local grid facilities (Les Cottrell), liaison with grid projects to ensure that the US & global infrastructure meet needs, work with network engineering staffs to help define the requirements & operational procedures for a Grid global operations center so the ensemble of nets is able to work efficiently and provide the high level of capability required, investigate new network technologies see how they could be used and develop s strategic plan for coordinated deployment (Phil DeMar/Bill St Arnaud).
Important for Internet 2 to work/collaborate with HENP field. There have been 2 official HENP WG meetings and 3rd is co-scheduled with I2 meeting May 6-8. Will need PolyCom viewstation, 2 overhead projectors and H.323.
The LANs are no longer leading the WANs in performance and technology.
Next meeting May 24th at SLAC.