ESNet outage because of repair work between Sunnyvale-San DiegoAdnan Iqbal and Les Cottrell. Page created: Jan 23, 2006Central Computer Access | Computer Networking | Network Group | More case studies |
|
Problem
From: owner-esnet-status@lists1.slac.stanford.edu
[mailto:owner-esnet-status@lists1.slac.stanford.edu] On Behalf Of operator
Sent: Monday, January 23, 2006 6:00 AM
To: esnet-status@es.net
Subject: [esnet-status] TTS#14413 SNV2-SDN1 <-10GE-> SDSC-SDN1
(NLR-SAND-SUNN-10GE-45) UP, 1/23
UPDATE,
Sunnyvale connectivity to San Diego via the 10GE was down this morning at
02:06PT (1/23) due to a maintenance by Level(3), at Tustin, CA.
The Circuit (NLR-SAND-SUNN-10GE-45) was restored at 03:34PT.
---------
Tony Quan
ESnet - The Energy Science Network 1 800-33-ESnet (1 800-333-7638)
Network Operations & Management Center +1 510-486-7600 (Outside of USA)
To report problems via Email: trouble@es.net
To request information via Email: info@es.net
To view an open trouble ticket: finger <ticket-number>@ticket.es.net
========================================================================
ESnet - Connecting people, information and resources. http://www.es.net
Analysis
Given the above information we were interested in seeing the effect on the network measurements of this ~ 90 minute outage. Only one of the paths that we were monitoring noticed a route change. This was SLAC to SDSC. Looking at the traceroutes measured at 10 minute intervals the route change occured between 2:06:18 and 2:16:35 and then reverted between 3:36:52 and 3:46:50. This agrees pretty well with the ESnet email report times of 2:06 and 3:34. We examined data reported from our measurements to SDSC, to investigate any impact and to correlate with any performance drop. The analysis revealed that quickly after the outage started, there was a route change as expected. Traffic flowing through 137.164.27.161 (Sunnyvale) changed the path and started flowing through 137.164.27.157 (Oakland). Traffic continued to flow through the new path until the end of the outage. The original path returned quickly after the outage i.e., within 10 minutes. Looking at the data obtained by different tools, we noticed that an effect is visible in pathchirp and ping but not really in the thrulay data. The drop for pathchirp was about a factor of 10, i.e. from about 1 Gbits/s to about 120Mbits/s. Ping showed an improvement in the minimum as well as the maximum round trip time in this period. The minimum ping dropped by about 1 msec. In other words the new route has lower latency but also lower bandwidth. Thrulay reported these changes a little late as compared to pathchirp and ping. None of the tools showed a persistent (> 6 hour) change such that an alert was produced.
In summary the event was most clearly detectable in the traceroute and ping tools. It was also clearly visible with the pathchirp available bandwidth tool, but not with the thrulay achievable throughput tool. The event duration was too short for us to detect with our event analysis toolkit. Graphs of the data and table describing route change are presented below.
Route Before Outage | Route during Outage | Route After Outage |
rtr-gsr-test 134.79.243.1 0.276 ms | rtr-gsr-test 134.79.243.1 0.268 ms | rtr-gsr-test 134.79.243.1 0.276 ms |
rtr-core1-p2p-test 134.79.252.5 0.243 ms | rtr-core1-p2p-test 134.79.252.5 0.256 ms | rtr-core1-p2p-test 134.79.252.5 0.243 ms |
rtr-dmz1-ger 134.79.135.15 0.232 ms | rtr-dmz1-ger 134.79.135.15 0.209 ms | rtr-dmz1-ger 134.79.135.15 0.232 ms |
i2-gateway.stanford.edu 192.68.191.83 0.268 ms | i2-gateway.stanford.edu 192.68.191.83 0.281 ms | i2-gateway.stanford.edu 192.68.191.83 0.268 ms |
hpr-svl-hpr--stan-ge.cenic.net 137.164.27.161 0.765 ms | hpr-oak-hpr--stan-ge.cenic.net 137.164.27.157 1.586 ms | hpr-svl-hpr--stan-ge.cenic.net 137.164.27.161 0.765 ms |
lax-hpr--svl-hpr-10ge.cenic.net 137.164.25.12 42.775 ms | sac-hpr--oak-hpr-10ge.cenic.net 137.164.25.17 3.152 ms | lax-hpr--svl-hpr-10ge.cenic.net 137.164.25.12 42.775 ms |
riv-hpr--lax-hpr-10ge.cenic.net 137.164.25.5 14.534 ms | riv-hpr--sac-hpr-10ge.cenic.net 137.164.25.10 11.847 ms | riv-hpr--lax-hpr-10ge.cenic.net 137.164.25.5 14.534 ms |
hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 14.162 ms | hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 12.957 ms | hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 14.162 ms |
lightning.sdsc.edu 132.249.30.6 14.285 ms | lightning.sdsc.edu 132.249.30.6 12.986 ms | lightning.sdsc.edu 132.249.30.6 14.285 ms |
node1.sdsc.edu 132.249.xxx.xxx 14.185 ms | node1.sdsc.edu 132.249.xxx.xxx 12.948 ms | node1.sdsc.edu 132.249.xxx.xxx 14.185 ms |