SLAC logo

Low Throughput to Daresbury Despite the Upgrade to a 1000Mb Link Network logo

Jerrod D. Williams. Page created: June 11, 2003

Central Computer Access | Computer Networking | Network Group | ICFA-NTF Monitoring
SLAC Welcome
Highlighted Home
Detailed Home
Search
Phonebook

Introduction

We noticed on Monday, June 9th, that Daresbury upgraded their connection to the Internet to a 1000Mb link to the internet. This was noticed via Jiri's Available Bandwidth Estimator referenced in this case study: "The influence of strong LAN traffic on target hosts on monitoring results". Even more noteworthy to us is the case that we are unable to reach transfers greater than 100Mb/s via this new link, roughly the same rates as seen before the upgrade.

Problem Reports

With the upgrade to a 1000Mb/s link to the internet, we expected to see a great change in out throughput to Daresbury as measured via iperf. This was not the case as throughputs remained averaging around ~80Mb/s. The path from UK(DL) to SLAC averages the same throughput. THe connection in UK between CLRC and DL is OK. Iperf from CLRC to DL is about 360Mb/s and from DL to CLRC is about 300Mb/s. The following graph illustrates transfers between SLAC and DL:

This defined an obvious bottleneck somewhere on the link between SLAC and Daresbury. We tested to see if the poor performance was simply isolated to this machine or if other machines in the area were also affected. From other parts of Europe, throughput is relative depending upon the path.
To/From CESNET (Czech republic Prague), transfers were okay.  We measured:
165-172 Mbs from Prague (cesnet.cz) to DL
171-207 Mb/s from DL to Prague.
With the usual fast response and finished in time interval ~ +0.5 sec.

From nikhef.nl, we measured 214Mb/s.

We conducted tests from SLAC to CLRC and noticed that CLRC delivers only around ~ 40Mb/s although it sits on a 1000Mb link.
CLRC

We then compared the traceroutes of the these sites to identify any similarities between CLRC and Daresbury and also isolate a particular link that maybe the cause of the bottleneck. These two machines did infact have similar routes while Nikhef's was severely different. Daresbury and CLRC's routes are identical from SLAC to Ja.net via Esnet then breaking off to the individual sites.
4iepm@hercules:~>traceroute DEV04.HEPGRID.CLRC.AC.UK
traceroute to DEV04.HEPGRID.CLRC.AC.UK (130.246.200.1), 30 hops max, 38 byte packets
 1  rtr-gsr-test (134.79.243.1)  0.176 ms  0.140 ms  0.199 ms
 2  rtr-dmz1-ger (134.79.135.15)  0.331 ms  0.271 ms  0.226 ms
 3  slac-rt4.es.net (192.68.191.146)  0.354 ms  0.311 ms  0.346 ms
 4  snv-pos-slac.es.net (134.55.209.1)  0.683 ms  0.696 ms  0.704 ms
 5  chicr1-oc192-snvcr1.es.net (134.55.209.54)  48.998 ms  48.920 ms  48.901 ms
 6  aoacr1-oc192-chicr1.es.net (134.55.209.58)  69.034 ms  68.969 ms  68.943 ms
 7  esnet.ny1.ny.geant.net (62.40.103.213)  69.067 ms  68.971 ms  69.047 ms
 8  ny.uk1.uk.geant.net (62.40.96.170)  136.608 ms  136.639 ms  136.615 ms
 9  janet-gw.uk1.uk.geant.net (62.40.103.150)  136.728 ms  136.626 ms  136.613 ms
10  lond-scr3.ja.net (146.97.37.81)  136.736 ms  136.760 ms  136.618 ms
11  po6-0.read-scr.ja.net (146.97.33.13)  138.370 ms  138.378 ms  138.248 ms
12  po2-0.ral-bar.ja.net (146.97.35.158)  138.753 ms  138.788 ms  138.750 ms
13  146.97.40.74 (146.97.40.74)  139.233 ms  139.071 ms  139.081 ms
14  192.100.78.46 (192.100.78.46)  138.947 ms  138.999 ms  138.954 ms
15  192.100.78.46 (192.100.78.46)  138.952 ms  138.871 ms  140.131 ms
5iepm@hercules:~>traceroute rtlin1.dl.ac.uk
traceroute to rtlin1.dl.ac.uk (193.62.119.20), 30 hops max, 38 byte packets
 1  rtr-gsr-test (134.79.243.1)  0.229 ms  0.155 ms  0.129 ms
 2  rtr-dmz1-ger (134.79.135.15)  0.310 ms  0.266 ms  0.338 ms
 3  slac-rt4.es.net (192.68.191.146)  0.339 ms  0.266 ms  0.342 ms
 4  snv-pos-slac.es.net (134.55.209.1)  0.689 ms  0.747 ms  0.693 ms
 5  chicr1-oc192-snvcr1.es.net (134.55.209.54)  49.004 ms  48.939 ms  48.890 ms
 6  aoacr1-oc192-chicr1.es.net (134.55.209.58)  68.951 ms  68.936 ms  68.941 ms
 7  esnet.ny1.ny.geant.net (62.40.103.213)  69.060 ms  68.985 ms  69.006 ms
 8  ny.uk1.uk.geant.net (62.40.96.170)  136.627 ms  136.728 ms  136.766 ms
 9  janet-gw.uk1.uk.geant.net (62.40.103.150)  136.714 ms  136.722 ms  136.606 ms
10  lond-scr3.ja.net (146.97.37.81)  136.722 ms  136.679 ms  136.606 ms
11  po6-0.read-scr.ja.net (146.97.33.13)  138.376 ms  138.319 ms  138.379 ms
12  po3-0.warr-scr.ja.net (146.97.33.54)  142.246 ms  142.075 ms  142.478 ms
13  po1-0.manchester-bar.ja.net (146.97.35.166)  142.472 ms  142.432 ms  142.355 ms
14  gw-nnw.core.netnw.net.uk (146.97.40.202)  142.610 ms  142.558 ms  142.648 ms
15  gw-fw.dl.ac.uk (193.63.74.233)  143.407 ms  143.147 ms  143.180 ms
16  alan3.dl.ac.uk (193.63.74.129)  150.683 ms  143.722 ms  144.240 ms
17  rtlin1.dl.ac.uk (193.62.119.20)  143.756 ms  143.858 ms  143.773 ms
5iepm@hercules:~>traceroute iepm-bw.cesnet.cz
traceroute to iepm-bw.cesnet.cz(195.113.187.2), 30 hops max, 38 byte packets
 1  rtr-gsr-test (134.79.243.1)  0.235 ms
 2  rtr-dmz1-ger (134.79.135.15)  0.349 ms
 3  slac-rt4.es.net (192.68.191.146)  0.397 ms
 4  snv-pos-slac.es.net (134.55.209.1)  0.749 ms
 5  chicr1-oc192-snvcr1.es.net (134.55.209.54)  49.088 ms
 6  aoacr1-oc192-chicr1.es.net (134.55.209.58)  69.040 ms
 7  esnet.ny1.ny.geant.net (62.40.103.213)  69.049 ms
 8  ny.uk1.uk.geant.net (62.40.96.170)  136.739 ms
 9  uk.fr1.fr.geant.net (62.40.96.89)  144.040 ms
10  fr.de1.de.geant.net (62.40.96.49)  152.508 ms
11  de.cz1.cz.geant.net (62.40.96.37)  160.975 ms
12  cesnet-gw.cz1.cz.geant.net (62.40.103.30)  161.703 ms
13  r41prg-pos13-0-stm16.cesnet.cz (195.113.156.110)  236.223 ms
 .
30  *

Current Beliefs and Questions

I believe that the bottleneck is seen encountered once we enter Europe at the UK-NL and UK-NY links. We don't feel this is DL's problem but for all UK sites, though we don't have another site to confirm. Tests to Cesnet prove that Geant is providing good links in that direction but when we traverse the janet-gw.uk1.uk.geant.net (62.40.103.150) link once entering Europe towards CLRC and DL, we see degenerate readings. To fully utilize the connection between the US and the UK, the bottleneck lines must be upgraded.