ESNet outage because of repair work between Sunnyvale-San Diego Network logo

Adnan Iqbal and Les Cottrell. Page created: Jan 23, 2006

Central Computer Access | Computer Networking | Network Group | More case studies
SLAC Welcome
Highlighted Home
Detailed Home


From: [] On Behalf Of operator
Sent: Monday, January 23, 2006 6:00 AM
Subject: [esnet-status] TTS#14413 SNV2-SDN1 <-10GE-> SDSC-SDN1 (NLR-SAND-SUNN-10GE-45) UP, 1/23


Sunnyvale connectivity to San Diego via the 10GE was down this morning at 02:06PT (1/23) due to a maintenance by Level(3), at Tustin, CA.

The Circuit (NLR-SAND-SUNN-10GE-45) was restored at 03:34PT.

Tony Quan

ESnet - The Energy Science Network 1 800-33-ESnet (1 800-333-7638)
Network Operations & Management Center +1 510-486-7600 (Outside of USA)
To report problems via Email:
To request information via Email:
To view an open trouble ticket: finger <ticket-number>
ESnet - Connecting people, information and resources.




    Given the above information we were interested in seeing the effect on the network measurements of this ~ 90 minute outage. Only one of the paths that we were monitoring noticed a route change. This was SLAC to SDSC. Looking at the traceroutes measured at 10 minute intervals the route change occured between 2:06:18 and 2:16:35 and then reverted between 3:36:52 and 3:46:50. This agrees pretty well with the ESnet email report times of 2:06 and 3:34. We examined data reported from our measurements to SDSC, to investigate any impact and to correlate with any performance drop. The analysis revealed that quickly after the outage started, there was a route change as expected. Traffic flowing through (Sunnyvale) changed the path and started flowing through (Oakland). Traffic continued to flow through the new path until the end of the outage. The original path returned quickly after the outage i.e., within 10 minutes. Looking at the data obtained by different tools, we noticed that an effect is visible in pathchirp and ping but not really in the thrulay data. The drop for pathchirp was about a factor of 10, i.e. from about 1 Gbits/s to about 120Mbits/s. Ping showed an improvement in the minimum as well as the maximum round trip time in this period. The minimum ping dropped by about 1 msec. In other words the new route has lower latency but also lower bandwidth. Thrulay reported these changes a little late as compared to pathchirp and ping. None of the tools showed a persistent (> 6 hour) change such that an alert was produced.

In summary the event was most clearly detectable in the traceroute and ping tools. It was also clearly visible with the pathchirp available bandwidth tool, but not with the thrulay achievable throughput tool. The event duration was too short for us to detect with our event analysis toolkit. Graphs of the data and table describing route change are presented below.




Route Before Outage Route during Outage Route After Outage
rtr-gsr-test 0.276 ms rtr-gsr-test 0.268 ms rtr-gsr-test 0.276 ms
rtr-core1-p2p-test 0.243 ms rtr-core1-p2p-test 0.256 ms rtr-core1-p2p-test 0.243 ms
rtr-dmz1-ger 0.232 ms rtr-dmz1-ger 0.209 ms rtr-dmz1-ger 0.232 ms 0.268 ms 0.281 ms 0.268 ms 0.765 ms 1.586 ms 0.765 ms 42.775 ms 3.152 ms 42.775 ms 14.534 ms 11.847 ms 14.534 ms 14.162 ms 12.957 ms 14.162 ms 14.285 ms 12.986 ms 14.285 ms 14.185 ms 12.948 ms 14.185 ms
Topology Graph