Packet loss on Stanford-Pac Bell DSL links - Mar 2001Les Cottrell. Page created: March 1, 2001.
Central Computer Access | Computer Networking | Network Group | ICFA-NTF Monitoring
I am fed up. I want this DSL cancelled if they cannot fix it. What are my options?
Everynite since Sunday the service goes bad in the early evening 6-7. It might have moments where it is decent but for the most part it is unusable in the evenings. Last nite at 1730 I did a traceroute. At that time the only thing I was capable of doing was ssh into slac to edit a file and mail from command line. Pine would not work, neither would citrix. ->more traceroute.1933 traceroute 220.127.116.11 traceroute: Warning: ckecksums disabled traceroute to 18.104.22.168 (22.214.171.124), 30 hops max, 40 byte packets 3 126.96.36.199 (188.8.131.52) 0.504 ms 0.437 ms 0.430 ms 4 * Core3-gateway.Stanford.EDU (184.108.40.206) 0.950 ms 0.658 ms 5 Core1-gateway.Stanford.EDU (220.127.116.11) 0.932 ms 0.931 ms 0.957 ms 6 forsythemr-gateway.Stanford.EDU (18.104.22.168) 1.772 ms 1.562 ms 1.288 ms 7 dsl-gateway-1.Stanford.EDU (22.214.171.124) 1.435 ms 1.235 ms 1.238 ms stalled here for about 30 seconds.... 8 172.22.90.254 (172.22.90.254) 16.641 ms 16.554 ms 16.397 ms
More details on the analysis of the 40,000 pings shows the distribution of:
The PingER history of RTT and losses shows clearly the lack of any accessibility for most of March 4th (GMT), and the marked improvement in loss afterwards.
The fact that the router DSL-GATEWAY-1.STANFORD.EDU responds to all pings is a clue to the non-responsiveness of the earlier campus core routers not being a network problem. The fact that the final node responding is 172.22.90.254 and not the probed gateway 126.96.36.199 is interesting. Stephen Tingley explains: We use private IP addresses for the WAN links for all DSL clients to conserve addresses.
It is also noteworthy that 172.22.90.254 does not respond to any pings whereas pings directly to 188.8.131.52 do respond. Stephen Tingley explains: Private IP addresses are not routed outside of campus per RFC 1918.
It is also noteworthy that sometimes it can take a long time for the traceroute to get a response from the last node, yet the RTT time stamp is small (typically in the teens). If one repeats the traceroute immediately then the long delay in response for the last node usually disappears. Since the IP address is being provided it would not appear to be a name server lookup problem.
I've looked at this router and it's going out of sync very often, this would account for the packet loss, normally takes 10-30 seconds to come sync up again. The router has a log which shows the loss of sync and re-sync process, all with timestamps. Loss of sync is almost always the result of a line problem.Teresa Downey reported at 9:31am:
I'm going to upgrade the software. This shouldn't fix this problem but should make the router more stable going forward. I'll need to reboot the router after, please let me know when a good time would be, a time there isn't anyone using it.
They upgraded my router and they contacted PacBell who think it is a phone line problem outside my house. They are going to send someone out by 5 today. I'm to call Nelson at Stanford support tomorrow if it is not fixed tonight.Teresa reported on 3/4/01 that the problems seemed to be fixed. Pac Bell stated that te problem had been wet pairs and that they had changed the pair delivering service to teresa'a home. Further monitoring indicated that the losses appeared to have gone away for good.