SLAC logo

Througput Performance between SLAC and ILAN (Israel) Network logo

Les Cottrell. Page created: September 3, 3000.

Central Computer Access | Computer Networking | Network Group | ICFA-NTF Monitoring
SLAC Welcome
Highlighted Home
Detailed Home
Search
Phonebook

Introduction

On August 23, 2000 Marek Karliner sent email saying: For the last few days the file transfer rate to the ac.il domain has been extremely low, between 3 to 10 kbytes/sec, while in the past we were able to go up to 600 kbytes/sec.

PingER measurements

The packet loss (measured by PingER) appears to have increased from < 1% to 2-3% between Aug 15 & 16. This is visible on links between buproxy.ac.il and es.net, hep.net, jlab.org, doe.gov, and slac.stanford.edu, but between buproxy.ac.il and anl.gov, cern.ch. desy.de and stanford.edu, the effect is not visible. The difference appears to be that the former sites peer with ESnet and thence to DANTE, whereas the latter (including ANL) do not use Esnet to get to ILAN. This might account for some of the degradation in performance and help with pinpointing the cause. At the same time the RTT between SLAC and buproxy.ac.il dropped from about 600msec. to about 420msec. so maybe the route changed.

Routes

The route from ILAN on September 3, 2000 appeared as follows:
Tracing the route to WWWSLUG.SLAC.Stanford.EDU (134.79.18.131)
  1 gp1-mag.ilan.net.il (128.139.198.80) 4 msec 4 msec 4 msec
  2 tau-gp2-fe-i2.ilan.net.il (192.114.99.33) 4 msec 4 msec 4 msec
  3 chi-gp3-0.ilan.net.il (192.114.99.65) 560 msec 556 msec 556 msec
  4 chi-gp4-fe-i2.ilan.net.il (192.114.101.34) 560 msec 556 msec 556 msec
  5 ESnet-ILAN.ilan.net.il (192.114.98.33) 372 msec 372 msec 372 msec
  6 slac1-atms.es.net (134.55.24.13) 420 msec 424 msec 556 msec
  7 RTR-DMZ.SLAC.Stanford.EDU (192.68.191.17) 424 msec 420 msec 420 msec
The route from SLAC (and the AS's) as seen by the SLAC reverse traceroute server is seen below.
 4  ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18) [AS32 - Stanford Linear Accelerator Center]  2.38 ms (ttl=252)
 5  pppl4-atms.es.net (134.55.24.10) [AS293 - Energy Sciences Network (ESnet)]  60.6 ms (ttl=251)
 6  60hudson-pppl.es.net (134.55.43.2) [AS293 - Energy Sciences Network (ESnet)]  64.1 ms (ttl=249!)
 7  esnet.ny.dante.net (212.1.200.217) [AS9010 - TEN-155/TEN-US Backbone]  64.2 ms (ttl=249)
 8  ny2-ny3.ny2.ny.dante.net (212.1.200.110) [AS9010 - TEN-155/TEN-US Backbone]  64.8 ms (ttl=248)
 9  il-us.il.dante.net (212.1.200.70) [AS9010 - TEN-155/TEN-US Backbone]  417 ms (ttl=248!)
10  tau-gp1-fe-i1.ilan.net.il (192.114.99.50) [AS701 - Architectual & Computer Aids]  417 ms (ttl=247!)
11  buproxy.iucc.ac.il (128.139.197.25) [AS378 - ILAN-AND-HUJI]  418 ms (ttl=246!)
Pingroute from SLAC to buproxy.ac.il on September 3, is shown below. It would appear to indicate there may be a problem for large (1400 bytes) packets between hops 8 and 9 (probably in DANTE), even though the link is lightly loaded (see http://noc.ilan.net.il/stats/TAU-GIGAPOP/il-us.il.dante.net.html. The loss pattern (low for small packets, high for large packets) may hint at an ATM issue, which results in cell loss.
6cottrell@flora01:~>bin/pingroute.pl -c 100 buproxy.ac.il
Sun Sep  3 10:18:45 2000
 Architecture=SUN5, commands=traceroute -q 1 and ping -s node 1400 100, pingroute.pl version=1.4, 5/16/00, debug=1
pingroute.pl version 1.4, 5/16/00 using traceroute to get nodes in route from flora01 to buproxy.ac.il
traceroute: Warning: ckecksums disabled
traceroute to buproxy.iucc.ac.il (128.139.197.25), 30 hops max, 40 byte packets
pingroute.pl version 1.4, 5/16/00 found 11 hops in route from flora01 to buproxy.ac.il
4  ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)  0.974 ms
5  pppl4-atms.es.net (134.55.24.10)  60.462 ms
6  60hudson-pppl.es.net (134.55.43.2)  62.903 ms
7  esnet.ny.dante.net (212.1.200.217)  63.676 ms
8  ny2-ny3.ny2.ny.dante.net (212.1.200.110)  64.363 ms
9  il-us.il.dante.net (212.1.200.70)  416.979 ms
10  tau-gp1-fe-i1.ilan.net.il (192.114.99.50)  416.958 ms
11  buproxy.iucc.ac.il (128.139.197.25)  417.550 ms
Wrote 11 addresses to /tmp/pingaddr, now ping each address 100 times from flora01
         pings/node=100                              100 byte packets           1400 byte packets
         NODE                                  %loss    min    max    avg %loss   min    max    avg from flora01
192.68.191.18   ESNET-A-GATEWAY.SLAC.STANFORD.    0%    0.0  144.0    3.0   0%    2.0  207.0    5.0 Sun Sep  3 10:28:41 PDT 2000
134.55.24.10    PPPL4-ATMS.ES.NET                 0%   60.0  257.0   63.0   0%   62.0  173.0   65.0 Sun Sep  3 10:31:59 PDT 2000
134.55.43.2     60HUDSON-PPPL.ES.NET              0%   62.0   66.0   62.0   0%   66.0   67.0   66.0 Sun Sep  3 10:35:17 PDT 2000
212.1.200.217   ESNET.NY.DANTE.NET                0%   63.0  297.0   67.0   0%   66.0  213.0   70.0 Sun Sep  3 10:38:35 PDT 2000
212.1.200.110   NY2-NY3.NY2.NY.DANTE.NET          0%   64.0  278.0   70.0   0%   68.0  253.0   71.0 Sun Sep  3 10:41:53 PDT 2000
212.1.200.70    IL-US.IL.DANTE.NET                1%  416.0  463.0  417.0  22%  422.0  424.0  422.0 Sun Sep  3 10:45:12 PDT 2000
192.114.99.50   TAU-GP1-FE-I1.ILAN.NET.IL         0%  417.0  421.0  417.0   5%  423.0  425.0  423.0 Sun Sep  3 10:48:32 PDT 2000
128.139.197.25  BUPROXY.IUCC.AC.IL                0%  417.0  430.0  418.0  21%  423.0  440.0  424.0 Sun Sep  3 10:51:51 PDT 2000

Path characterization

We used pathchar to characterize the route. The results are shown below.
3cottrell@flora02:~>pathchar buproxy.ac.il
pathchar to buproxy.iucc.ac.il (128.139.197.25)
 mtu limitted to 1500 bytes at FLORA02.SLAC.Stanford.EDU (134.79.16.57)
 doing 32 probes at each of 64 to 1500 by 44
 0 Host
 |    25 Mb/s,   211 us (0.90 ms)
 1 router1 
 |    93 Mb/s,   175 us (1.38 ms)
 2 router2
 |    91 Mb/s,   55 us (1.62 ms)
 3 border router
                        -> 134.79.111.4 (22920)           
 |   111 Mb/s,   -111 us (1.51 ms)
 4?ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)
                        -> 192.68.191.18 (1)           
 |    27 Mb/s,   29.8 ms (61.5 ms)
 5?pppl4-atms.es.net (134.55.24.10)
                        -> 134.55.24.10 (1)           
 |    27 Mb/s,   1.08 ms (64.1 ms)
 6?60hudson-pppl.es.net (134.55.43.2)
                        -> 134.55.43.2 (1)           
 |    43 Mb/s,   408 us (65.2 ms)
 7?esnet.ny.dante.net (212.1.200.217)
                        -> 212.1.200.217 (1)           
 |    53 Mb/s,   260 us (66.0 ms)
 8?ny2-ny3.ny2.ny.dante.net (212.1.200.110)
 |   8.3 Mb/s,   176 ms (420 ms),  7% dropped
 9 il-us.il.dante.net (212.1.200.70)
                        -> 212.1.200.70 (4)           
 |    36 Mb/s,   84 us (421 ms),  8% dropped
10?tau-gp1-fe-i1.ilan.net.il (192.114.99.50)
                        -> 192.114.99.50 (4)           
 |    27 Mb/s,   2 us (421 ms),  8% dropped
11?buproxy.iucc.ac.il (128.139.197.25)
11 hops, rtt 417 ms (421 ms), bottleneck 8.3 Mb/s, pipe 436669 bytes

Possible Causes

On September 3 Rafi Sadowsky of ILAN sent email to routing@es.net saying: The basic problem was reported as poor TCP performance between TAU(Tel-Aviv University) and SLAC (at Stanford U) after some digging around it seems that you're sending traffic to AS378(us) AS9010 rather than via our direct peering at STAR-TAP(Chicago).

Late on September 3 email from Joe Burrescia of the ESnet NOC said: I believe this trouble can be traced to the ESnet upgrade of our router connecting to DANTE from a cisco to a Juniper. The bgp syntax is just different enough that we missed matching a local-pref and the route through DANTE was preferred. I've corrected the problem and the route to is now once again preferring our peering at Chicago. I apologize for all the trouble this has caused. The route from SLAC to ILAN on September 4th is shown below. At this time the route appeared to go via the Chicago STAR-TAP.

 4  ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18) [AS32 - Stanford Linear Accelerator Center]  1.33 ms (ttl=252)
 5  chicago1-atms.es.net (134.55.24.17) [AS293 - Energy Sciences Network (ESnet)]  57.6 ms (ttl=251)
 6  ILAN-ESnet.ilan.net.il (192.114.98.34) [AS701 - Cimatron CAD/CAM Systems]  58.9 ms (ttl=250)
 7  chi-gp3-fe-i2.ilan.net.il (192.114.101.33) [AS4617 - Amdocs, Inc.]  59.1 ms (ttl=249)
 8  tau-gp2-s0.ilan.net.il (192.114.99.66) [AS701 - Architectual & Computer Aids]  611 ms (ttl=248)
 9  tau-gp1-fe-i2.ilan.net.il (192.114.99.34) [AS701 - Architectual & Computer Aids]  612 ms (ttl=247)
10  buproxy.iucc.ac.il (128.139.197.25) [AS378 - ILAN-AND-HUJI]  611 ms (ttl=246)
Also email from Oded Comay on September 4th that said Fixing ESnet routing scheme probably solved the problem we had with SLAC. However, we seem to have a much larger problem with DANTE, which is about to become our only link for few weeks now. The problem is that although the link is not heavily loaded (see http://noc.ilan.net.il/stats/TAU-GIGAPOP/il-us.il.dante.net.html), we suffer high loss rate. The loss pattern (low for small packets, high for large packets) may hint at an ATM issue, which results in cell loss. I suggest we take a look at our DANTE peer router and verify a correct setting.

Further email from Rafi Sadowsky on September 4th said: P.S. the drops from DANTE-NY to ILAN are probably due to an ATM policing problem which seems to be kicking in before the guaranteed PVC capacity.

Email from Yaron Zabary on Spetember 4th said: After I spoke with Marek over the phone, it turned out he thought that the degraded performance was because the sat T3 line was down. After explaining that the line is up but without the Mentat SkyX boxes, we agreed that the bandwidth (~30Kb/s) was reasonable for these conditions.


[ Feedback | Reporting Problems ]
Page owner: Les Cottrell