Througput Performance between SLAC and CERN

Les Cottrell. Page created: May 2, 2000, last update: June 24, 2000.

Central Computer Access | Computer Networking | Network Group | ICFA-NTF Monitoring

Introduction

Gilles Farrache of IN2P3 has been making measurements of the FTP throughput between CERN and SLAC. He observed a large asymmetry in the performance in the 2 directions. From CERN to SLAC he saw about 26Mbps but from SLAC to CERN only about 18 Mbps. This asymmetry was measured several times at different times of the week and it was consistent. Since bulk data transfers from SLAC to other sites are critical to BaBar it is important to understand and resolve this.

Throughput Measurements

Gilles used an FTP tool he has written called bbftp. This allows one to set the window size and number of parallel streams. The measurements were made between sunstats.cern.ch and tersk01.slac.stanford.edu. Tersk01 is a Sun 420R with 4 450MHz cpus. It has a 1Gbps switched Ethernet interface to the LAN. It was running SunOS 5.6. Sunstats is a Sun Sparc Ultra -60 running SunOS 5.6. The tests were made transfering a BaBar file of 1.4 GBytes with 10 streams and a window size of 256 KBytes. For more on choosing the window size see TCP Tuning Guide for Distributed Application on WANs. One test was made between 22h36 MET DST and 23h53 MET DST 17 April 2000 (same program, same file) with the following results:

SLAC->CERN:: 14.8 Mpps, 15.7 Mbps, 15.6 Mbps, 15.2 Mbps
CERN->SLAC: 24.2 Mbps, 25.1 Mbps, 24.7 Mbps

Another test was made on April 17th and since it appeared that there was an asymmetry in performances, Gilles also made tests between FERMI and SLAC. The results are the following :

FERMI -> CERN :         MAX 30.1 Mbps
CERN->FERMI   :         MAX 30.1 Mbps
SLAC->CERN    :         MAX 15 Mbps
CERN->SLAC    :         MAX 26 Mbps
FERMI->SLAC   :         MAX 29.6 Mbps
SLAC->FERMI   :         MAX 24.2 Mbps

Thus it appears we can get less bandwith from SLAC to anywhere than from anywhere to SLAC.

Further tests were made May 2nd, 2000, this time Gilles waited for the trafic getting out SLAC or coming into SLAC to be very low. A third set of tests started Monday 1st May at 0:25 (US SLAC time). The plot below shows that the trafic was at this time :

Current In: 1574.9 kb/s (3.5%)
Current Out: 2783.0 kb/s (6.2%)

Tests have been done until Monday 1st May 7h30. The second image give the trafic monitored by SLAC mrtg. Even competing with little traffic we achieve the same asymmetric performances.

The measurement results (15 transfers of one file of 1467582976 bytes for SLAC-CERN and 17 transfers of one file of 1467582976 bytes for CERN-SLAC) showed:

SLAC-CERN : from 17.3 Mbits/s to 19.9 Mbits/s (mean 18.6 Mbits/s)
CERN-SLAC : from 25.3 Mbits/s to 30.3 Mbits/s (mean 27.7 Mbits/s)

Routes

The hosts involved were sunstats.cern.ch and tersk01.slac.stanford.edu. We measured the bottleneck bandwidths using pathchar:

SLAC to CERN

4cottrell@flora01:~>sudo /afs/slac/g/scs/bin/pathchar cernh9-s5-0.cern.ch
Password:
pathchar to cernh9-s5-0.cern.ch (192.65.184.142)
 mtu limitted to 1500 bytes at FLORA01.SLAC.Stanford.EDU (134.79.16.29)
 doing 32 probes at each of 64 to 1500 by 44
 0 FLORA01.SLAC.Stanford.EDU (134.79.16.29)
 |    30 Mb/s,   197 us (797 us)
 1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2)
 |    54 Mb/s,   193 us (1.41 ms)
 2 RTR-CGB6.SLAC.Stanford.EDU (134.79.135.6)
 |   112 Mb/s,   65 us (1.64 ms)
 3 RTR-DMZ.SLAC.Stanford.EDU (134.79.111.4)
 |   106 Mb/s,   -43 us (1.67 ms)
 4 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)
 |    30 Mb/s,   27.9 ms (57.8 ms)
 5 chicago1-atms.es.net (134.55.24.17)
 |    28 Mb/s,   543 us (59.4 ms)
 6 206.220.243.32 (206.220.243.32)
                        -> 206.220.243.32 (2)
 |    29 Mb/s,   56.3 ms (172 ms),  1% dropped
 7?cernh9-s5-0.cern.ch (192.65.184.142)
7 hops, rtt 170 ms (172 ms), bottleneck  28 Mb/s, pipe 593027 bytes

CERN to SLAC:

ccdevsn1:csh[10] pathchar tersk01.slac.stanford.edu
pathchar to tersk01.slac.stanford.edu (134.79.125.21)
 doing 32 probes at each of 64 to 1500 by 32
 0 localhost
 |    31 Mb/s,   284 us (0.96 ms)
 1 Lyon-ANDA.in2p3.fr (134.158.104.100)
 |    84 Mb/s,   120 us (1.34 ms)
 2 Lyon-TIF.in2p3.fr (134.158.240.6)
                      -> 192.70.69.141 (1404)
                      -> 192.70.69.13 (1423)
 |   1.5 Mb/s,   3.19 ms (15.6 ms)
 3?Cern1.in2p3.fr (192.70.69.10)
 |    19 Mb/s,   653 us (17.6 ms)
 4 cernh9.cern.ch (192.65.185.9)
 |    31 Mb/s,   55.9 ms (130 ms)
 5 ar1-chicago.cern.ch (192.65.184.141)
 |    27 Mb/s,   664 us (132 ms),  +q 1.04 ms (3.56 KB) *2
 6 chicago-nap.es.net (206.220.243.85)
 |    30 Mb/s,   27.9 ms (188 ms),  +q 1.02 ms (3.80 KB) *2
 7 slac1-atms.es.net (134.55.24.13)
 |    73 Mb/s,   210 us (188 ms),  +q 1.03 ms (9.34 KB)
 8 RTR-DMZ.SLAC.Stanford.EDU (192.68.191.17)

The pathchar measurements for SLAC to CERN were repeated several times with little change. The routes are fairly symmetric. The bottleneck bandwidth from CERN to SLAC is about 27Mbps and occurs in Chicago. A similar effect appears for SLAC to CERN.

Losses and RTT

Typical ping round trip times (RTT) between SLAC and CERN are about 180 msec. and ping losses to CERN are of the order of 0.1% (median over a month). As would be expected the RTT and losses are similar regardless of what end they are measured from.

Looking at the Surveyor one way measurements for April 17th between SLAC and CERN the effect of the file transfer on the one-way delays is apparent (note the large increases between 20:00 and 22:00 UTC, i.e. when Gilles was making his tests). It may be significant that the impact on the delays from CERN to SLAC are larger (increase from 80msec. to 220msec.) than from SLAC to CERN (increase from 80msec. to 110msec.) During this time the routes appeared to be stable, also no packet loss was reported by Surveyor.
Surveyor SLAC CERN April 17, 2000

Utilization

The MRTG SLAC ESnet link utilization plots for the SLAC located ESnet owned router (slac-rt1.es.net) link to the ATM cloud, for the morning of May 2nd is seen below. The points are for 5 minute averages, the green is the traffic inbound to SLAC, and the blue is the outbound from SLAC. Spikes in the utilization can be seen, presumably corresponding to Gilles tests. The link is a 43Mbps T1 link carrying ATM. 3.5 Mbps is reserved for a QoS pilot After one removes the ATM etc. overhead (about 20%) the link has a capacity of about 32Mbps. In the graph it can be seen that the utilzation to SLAC has higher peaks than from SLACX. This asymmetry is in agreement with the throughput measurements.
MRTG plot of Utilization of SLAC ESnet link

The equivalent MRTG plot for the CERN USA link is seen below. It is seen that the peak utilisation from CERN is greater than to CERN which is again in agreement with the asymmetry in throughput.
CERN-US link Utilisation May 2, 2000

Below is the utilization for the Tersk01 1Gbps Network Interface on May 1-2, 2000. In this case the red line is the utilization from the Catalyst 6509 switch to tersk01 and blue line is from terks01 to the switch. Again the peak utilization is in higher in the direction of CERN to SLAC.
Tersk01 Network Interface Utilization May 1-2,2000

Tersk01 Network Interface Utilization May 1-2,2000

One Possible Cause

Using iperf and tcpdump/ tcptrace/ xplot with an iperf server at solaris.cern.ch and the client sending data via TCP from pharlap.slac.stanford.edu with 256kByte windows and 10 streams on Thursday 6/8/00 we saw about 15Mbps aggregate throughput, whereas on Friday 6/9/00 we saw about 25Mbps aggregate throughput. Email to Olivier Martin of CERN elicited the following in response to a question about the dramatic change in thruput: Yes, we have known for some time about a nasty interaction between the Cisco transparent Web cache and the CPU utilization of the Cisco 7507 router leading to packet losses. The Web cache was disabled yesterday (Thursday June 8th, 2000) evening.

However, the bbftp thruput was still measured at about 16Mbps with 10 streams, from tersk01.slac.stanford.edu to sunstats.cern.ch (the iperf thruput from tersk01 to sunstats was measured at ~25Mbits/sec).

Compression

Dominique ran some tests with compression:

From datamove3 :
 -  2 threads No Comp  7.5 Mb/s
 -  4 threads No Comp 12.4 Mb/s
 -  5 threads No Comp 13.6 Mb/s
 -  6 threads No Comp 19.1 Mb/s
 -  8 threads No Comp 20.1 Mb/s
 - 10 threads No Comp 19.9 Mb/s
 
 -  5 threads with Comp 13.6 Mb/s
 -  8 threads with Comp  6.3 Mb/s
 - 10 threads with Comp  7.4 Mb/s
 
From tersk02 :
 - 10 threads with Comp 26.4 Mb/s - Second try : 25.1 Mb/s
 
From shire01 :
 - 10 threads with Comp 37.9 Mb/s

So it appears that the network is fine, the throughput is limited by the datamove3 machine itself when we use compression. Dominique saw the load on datamove3 increasing from ~5 up to ~17 when I was trying to use 10 threads with compression.

Randy Melen reports 6/13/00: I've watched datamove3 today and see 8 bbrftp processes running, though sleeping. I haven't seen the "load" go above 1.3, and typically it is much loiwer. I see CPU usage typically less than 10%. I conclude that nothing us being done on this system whenever I watch it! So I need to know when these transfers are being done. When datamove3 was purchased, it was expected to do the normal "datamove-like" work done and so only 1GB of memory was allocated. Maybe that's not enough now for this different kind of use. Also it has 4 CPUs at 336MHz, relatively modest today, but sufficient probably for "datamove-like" work -- but not enough if you're compressing multi-gigabytes of data and shoving it out a Gb Ethernet interface that also soaks up CPU capacity with a high interrupt level. Two tests that would be interesting would be:

do an uncompressed transfer somewhere, measuring the CPU usage, memory and disk usage, and the actual network datarate going out the interface.
do it again, same file, but compressing as you transfer.

Impact on Others

Concerns were raised about advance notice to people who might be adversely affected by the high-perforamce testing. To give advance notice, we would have to identify who to contact. Since there are several bottlenecks between here and say IN2P3 (i.e. the SLAC to ESnet bottleneck, the Trans-Atlantic bottleneck, the CERN-IN2P3 bottleneck), the number of possible sites affected (and hence contacts who would need to be notified) could be very large, e.g. all our connections to ESnet and hence the outside world. We could send email to ESnet sites, CERN & IN2P3, but I it is possible that most people would consider it email noise and anyhow we would not have notified our .edu partners who also may be affected. We could expand to send email to all BaBar sites, but that still leaves out non BaBar sites. We have announced that we are making throughput tests at several meetings in our community (PPDG, ESnet, I2 QBone, BaBar etc.) that may impact others, the comments for the most part are something like "go for it, I doubt if it will be noticeable" in fact if it was noticeable then we might have a stronger case for reverse QoS. After some discussion it was decided to post a notification for the SLAC commnity in the SLAC Quick News web page. This notification will also be fowarded to ESnet, IN2P3 and CERN.

To ascertain the possible impact of the high performance testing on interactive applications that require low, consistent RTT and low loss, we ran IPERF from SLAC (oceanus.slac.stanford.edu) and CERN (sunstats.cern.ch) and simultaneously measured the ping RTT and loss from both ends. The results shown below, apparently indicate that we could see litle difference on the RTT and loss whether or not we were running IPERF:

				Min	Avg	Max	Loss
CERN > SLAC without IPERF	189	251	435	0/380
SLAC > CERN without IPERF	181	221	424	0/100
CERN > SLAC with    IPERF       180     288     405     2/604
SLAC > CERN with    IPERF       182     289     397     4/607

The IPERF was with 15 streams and windows of 256Kbytes and ran for 10 minutes. The test was started at 10:29am Saturday 6/24/00. Running IPERF we were able to get about 9.4Mbps thruput.

Looking at the Router utilization graph for the SLAC ESnet line (see below), it appears something else was making heavy continuous use of the link for many hours (since the outbound utilization from SLAC was over 70% for at least 24 hours before (10:00am PDT) our test was made and through our test. Further investigation revealed that Dominique Boutigny was running bbfftp between SLAC and IN2P3 at this time. He is using bbftp in production now to transfer data to IN2P3 in Lyon, and in the week from 6/19/00 thru 6/25/00 he successfully transferred between 300 and 350 GBytes. Note that IN2P3 is accessed via CERN. The SLAC ESnet link is also being heavily used to transfer Monte Carlo data from INFN/Rome, IN2P3 and Colorado to SLAC, and for LLNL to write directly into the Objectivity federation. In addition Caltech will soon be starting to use the link heavily.

Since somebody was making high performance tests when we made the ping measurements without IPERF running we need to look at the longer term PingER RTTs and losses and see if there is a correlation with the utilization. The PingER RTT (red) and Loss (blue) results (one point per half hour interval) for SLAC to CERN for Monday 19th June thru Sunday 25th June are shown below. It can be seen that there is a noticeable impact on RTT, however the loss rates appear to be minimally impacted by the utilization.

[ Feedback | Reporting Problems ]
Page owner: Les Cottrell