ESNet outage because of repair work between Sunnyvale-San Diego Network logo

Adnan Iqbal and Les Cottrell. Page created: Jan 23, 2006

Central Computer Access | Computer Networking | Network Group | More case studies
SLAC Welcome
Highlighted Home
Detailed Home
Search
Phonebook

Problem

From: owner-esnet-status@lists1.slac.stanford.edu [mailto:owner-esnet-status@lists1.slac.stanford.edu] On Behalf Of operator
Sent: Monday, January 23, 2006 6:00 AM
To: esnet-status@es.net
Subject: [esnet-status] TTS#14413 SNV2-SDN1 <-10GE-> SDSC-SDN1 (NLR-SAND-SUNN-10GE-45) UP, 1/23

UPDATE,

Sunnyvale connectivity to San Diego via the 10GE was down this morning at 02:06PT (1/23) due to a maintenance by Level(3), at Tustin, CA.

The Circuit (NLR-SAND-SUNN-10GE-45) was restored at 03:34PT.

---------
Tony Quan

ESnet - The Energy Science Network 1 800-33-ESnet (1 800-333-7638)
Network Operations & Management Center +1 510-486-7600 (Outside of USA)
To report problems via Email: trouble@es.net
To request information via Email: info@es.net
To view an open trouble ticket: finger <ticket-number>@ticket.es.net
========================================================================
ESnet - Connecting people, information and resources. http://www.es.net

 

 

Analysis

    Given the above information we were interested in seeing the effect on the network measurements of this ~ 90 minute outage. Only one of the paths that we were monitoring noticed a route change. This was SLAC to SDSC. Looking at the traceroutes measured at 10 minute intervals the route change occured between 2:06:18 and 2:16:35 and then reverted between 3:36:52 and 3:46:50. This agrees pretty well with the ESnet email report times of 2:06 and 3:34. We examined data reported from our measurements to SDSC, to investigate any impact and to correlate with any performance drop. The analysis revealed that quickly after the outage started, there was a route change as expected. Traffic flowing through 137.164.27.161 (Sunnyvale) changed the path and started flowing through 137.164.27.157 (Oakland). Traffic continued to flow through the new path until the end of the outage. The original path returned quickly after the outage i.e., within 10 minutes. Looking at the data obtained by different tools, we noticed that an effect is visible in pathchirp and ping but not really in the thrulay data. The drop for pathchirp was about a factor of 10, i.e. from about 1 Gbits/s to about 120Mbits/s. Ping showed an improvement in the minimum as well as the maximum round trip time in this period. The minimum ping dropped by about 1 msec. In other words the new route has lower latency but also lower bandwidth. Thrulay reported these changes a little late as compared to pathchirp and ping. None of the tools showed a persistent (> 6 hour) change such that an alert was produced.

In summary the event was most clearly detectable in the traceroute and ping tools. It was also clearly visible with the pathchirp available bandwidth tool, but not with the thrulay achievable throughput tool. The event duration was too short for us to detect with our event analysis toolkit. Graphs of the data and table describing route change are presented below.


ping  


pathchirp  


Thrulay  

Route Before Outage Route during Outage Route After Outage
rtr-gsr-test 134.79.243.1 0.276 ms rtr-gsr-test 134.79.243.1 0.268 ms rtr-gsr-test 134.79.243.1 0.276 ms
rtr-core1-p2p-test 134.79.252.5 0.243 ms rtr-core1-p2p-test 134.79.252.5 0.256 ms rtr-core1-p2p-test 134.79.252.5 0.243 ms
rtr-dmz1-ger 134.79.135.15 0.232 ms rtr-dmz1-ger 134.79.135.15 0.209 ms rtr-dmz1-ger 134.79.135.15 0.232 ms
i2-gateway.stanford.edu 192.68.191.83 0.268 ms i2-gateway.stanford.edu 192.68.191.83 0.281 ms i2-gateway.stanford.edu 192.68.191.83 0.268 ms
hpr-svl-hpr--stan-ge.cenic.net 137.164.27.161 0.765 ms hpr-oak-hpr--stan-ge.cenic.net 137.164.27.157 1.586 ms hpr-svl-hpr--stan-ge.cenic.net 137.164.27.161 0.765 ms
lax-hpr--svl-hpr-10ge.cenic.net 137.164.25.12 42.775 ms sac-hpr--oak-hpr-10ge.cenic.net 137.164.25.17 3.152 ms lax-hpr--svl-hpr-10ge.cenic.net 137.164.25.12 42.775 ms
riv-hpr--lax-hpr-10ge.cenic.net 137.164.25.5 14.534 ms riv-hpr--sac-hpr-10ge.cenic.net 137.164.25.10 11.847 ms riv-hpr--lax-hpr-10ge.cenic.net 137.164.25.5 14.534 ms
hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 14.162 ms hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 12.957 ms hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 14.162 ms
lightning.sdsc.edu 132.249.30.6 14.285 ms lightning.sdsc.edu 132.249.30.6 12.986 ms lightning.sdsc.edu 132.249.30.6 14.285 ms
node1.sdsc.edu 132.249.xxx.xxx 14.185 ms node1.sdsc.edu 132.249.xxx.xxx 12.948 ms node1.sdsc.edu 132.249.xxx.xxx 14.185 ms
Topology Graph