SLAC logo

Packet loss on Stanford-Pac Bell DSL links - Mar 2001 Network logo

Les Cottrell. Page created: March 1, 2001.

Central Computer Access | Computer Networking | Network Group | ICFA-NTF Monitoring
SLAC Welcome
Highlighted Home
Detailed Home
Search
Phonebook

Introduction

Teresa Downey reported the following on Wednesday 28, 2001 at 7:27pm:
I am fed up. I want this DSL cancelled if they cannot fix it. 
What are my options? 

Everynite since Sunday the service goes bad in the early evening 6-7. It might have moments where it is decent but for the most part it is unusable in the evenings. Last nite at 1730 I did a traceroute. At that time the only thing I was capable of doing was ssh into slac to edit a file and mail from command line. Pine would not work, neither would citrix. ->more traceroute.1933 traceroute 171.66.181.249 traceroute: Warning: ckecksums disabled traceroute to 171.66.181.249 (171.66.181.249), 30 hops max, 40 byte packets 3 192.68.191.83 (192.68.191.83) 0.504 ms 0.437 ms 0.430 ms 4 * Core3-gateway.Stanford.EDU (171.64.1.222) 0.950 ms 0.658 ms 5 Core1-gateway.Stanford.EDU (171.64.3.67) 0.932 ms 0.931 ms 0.957 ms 6 forsythemr-gateway.Stanford.EDU (171.64.1.90) 1.772 ms 1.562 ms 1.288 ms 7 dsl-gateway-1.Stanford.EDU (171.64.6.66) 1.435 ms 1.235 ms 1.238 ms stalled here for about 30 seconds.... 8 172.22.90.254 (172.22.90.254) 16.641 ms 16.554 ms 16.397 ms

Packet losses & RTT

Les Cottrell monitored Teresa's Pac Bell supplied Cayman DSL router at 171.66.181.249 using pings separated by 1 second for many hours. Analysis of this data indicated that the path was seeing a large probability (compared to a simple random distribution of ping losses) of adjacent pings being lost. For 40,000 pings there were fifteen times where there were losses that extended over 17 seconds. Further the losses were very variable in time, for example in one hour the path would have 10% losses whereas in other hours there would be less than 2.5% or no packet loss at all. The min/avg/max RTTs were 15.6/27.7/690ms.

More details on the analysis of the 40,000 pings shows the distribution of:

It is also interesting that a gap of 10 packets observed at 18:06 on Saturday March 3, 2001, was correlated with Teresa rebooting her router.

The PingER history of RTT and losses shows clearly the lack of any accessibility for most of March 4th (GMT), and the marked improvement in loss afterwards.

Pingroute

Unfortunately for the current debugging purposes, it appears that some of the campus internal routers do not respond to ping and thus report 100% loss. A pingroute to Teresa Downey's home machine from SLAC illustrates this. Stephen Tingley of campus explains: We do not advertise the networks in the backbone outside of Stanford, nor do we advertise the private IP addresses we use, that's why those pings had loss. We don't feel there's a need for people outside of Stanford to ping our backbone routers.

The fact that the router DSL-GATEWAY-1.STANFORD.EDU responds to all pings is a clue to the non-responsiveness of the earlier campus core routers not being a network problem. The fact that the final node responding is 172.22.90.254 and not the probed gateway 171.66.181.249 is interesting. Stephen Tingley explains: We use private IP addresses for the WAN links for all DSL clients to conserve addresses.

It is also noteworthy that 172.22.90.254 does not respond to any pings whereas pings directly to 171.22.90.249 do respond. Stephen Tingley explains: Private IP addresses are not routed outside of campus per RFC 1918.

It is also noteworthy that sometimes it can take a long time for the traceroute to get a response from the last node, yet the RTT time stamp is small (typically in the teens). If one repeats the traceroute immediately then the long delay in response for the last node usually disappears. Since the IP address is being provided it would not appear to be a name server lookup problem.

Resolution

The problem was reported by Teresa Downey to Stanford University DSL folks at 723-1611 on Thursday 3/1/01 at 8:00am PST. They opened a ticket with PacBell to look into the problem. Stephen Tingley of campus responded at 8:52 am.
I've looked at this router and it's going out of sync very often, this
would account for the packet loss, normally takes 10-30 seconds to come
sync up again.
The router has a log which shows the loss of 
sync and re-sync process, all with timestamps.
Loss of sync is almost always the result of a line problem.

I'm going to upgrade the software. This shouldn't fix this problem but should make the router more stable going forward. I'll need to reboot the router after, please let me know when a good time would be, a time there isn't anyone using it.

Teresa Downey reported at 9:31am:

They upgraded my router and they contacted PacBell who
think it is a phone line problem outside my house. They
are going to send someone out by 5 today. I'm to call
Nelson at Stanford support tomorrow if it is not fixed
tonight. 

Teresa reported on 3/4/01 that the problems seemed to be fixed. Pac Bell stated that te problem had been wet pairs and that they had changed the pair delivering service to teresa'a home. Further monitoring indicated that the losses appeared to have gone away for good.
Page owner: Les Cottrell