ESNet outage because of repair work between Sunnyvale-San Diego

Adnan Iqbal and Les Cottrell. Page created: Jan 23, 2006

Central Computer Access | Computer Networking | Network Group | More case studies

Problem

From: owner-esnet-status@lists1.slac.stanford.edu [mailto:owner-esnet-status@lists1.slac.stanford.edu] On Behalf Of operator
Sent: Monday, January 23, 2006 6:00 AM
To: esnet-status@es.net
Subject: [esnet-status] TTS#14413 SNV2-SDN1 <-10GE-> SDSC-SDN1 (NLR-SAND-SUNN-10GE-45) UP, 1/23

UPDATE,

Sunnyvale connectivity to San Diego via the 10GE was down this morning at 02:06PT (1/23) due to a maintenance by Level(3), at Tustin, CA.

The Circuit (NLR-SAND-SUNN-10GE-45) was restored at 03:34PT.

---------
Tony Quan

ESnet - The Energy Science Network 1 800-33-ESnet (1 800-333-7638)
Network Operations & Management Center +1 510-486-7600 (Outside of USA)
To report problems via Email: trouble@es.net
To request information via Email: info@es.net
To view an open trouble ticket: finger <ticket-number>@ticket.es.net
========================================================================
ESnet - Connecting people, information and resources. http://www.es.net

Analysis

Given the above information we were interested in seeing the effect on the network measurements of this ~ 90 minute outage. Only one of the paths that we were monitoring noticed a route change. This was SLAC to SDSC. Looking at the traceroutes measured at 10 minute intervals the route change occured between 2:06:18 and 2:16:35 and then reverted between 3:36:52 and 3:46:50. This agrees pretty well with the ESnet email report times of 2:06 and 3:34. We examined data reported from our measurements to SDSC, to investigate any impact and to correlate with any performance drop. The analysis revealed that quickly after the outage started, there was a route change as expected. Traffic flowing through 137.164.27.161 (Sunnyvale) changed the path and started flowing through 137.164.27.157 (Oakland). Traffic continued to flow through the new path until the end of the outage. The original path returned quickly after the outage i.e., within 10 minutes. Looking at the data obtained by different tools, we noticed that an effect is visible in pathchirp and ping but not really in the thrulay data. The drop for pathchirp was about a factor of 10, i.e. from about 1 Gbits/s to about 120Mbits/s. Ping showed an improvement in the minimum as well as the maximum round trip time in this period. The minimum ping dropped by about 1 msec. In other words the new route has lower latency but also lower bandwidth. Thrulay reported these changes a little late as compared to pathchirp and ping. None of the tools showed a persistent (> 6 hour) change such that an alert was produced.

In summary the event was most clearly detectable in the traceroute and ping tools. It was also clearly visible with the pathchirp available bandwidth tool, but not with the thrulay achievable throughput tool. The event duration was too short for us to detect with our event analysis toolkit. Graphs of the data and table describing route change are presented below.

ping

Route Before Outage	Route during Outage	Route After Outage
rtr-gsr-test 134.79.243.1 0.276 ms	rtr-gsr-test 134.79.243.1 0.268 ms	rtr-gsr-test 134.79.243.1 0.276 ms
rtr-core1-p2p-test 134.79.252.5 0.243 ms	rtr-core1-p2p-test 134.79.252.5 0.256 ms	rtr-core1-p2p-test 134.79.252.5 0.243 ms
rtr-dmz1-ger 134.79.135.15 0.232 ms	rtr-dmz1-ger 134.79.135.15 0.209 ms	rtr-dmz1-ger 134.79.135.15 0.232 ms
i2-gateway.stanford.edu 192.68.191.83 0.268 ms	i2-gateway.stanford.edu 192.68.191.83 0.281 ms	i2-gateway.stanford.edu 192.68.191.83 0.268 ms
hpr-svl-hpr--stan-ge.cenic.net 137.164.27.161 0.765 ms	hpr-oak-hpr--stan-ge.cenic.net 137.164.27.157 1.586 ms	hpr-svl-hpr--stan-ge.cenic.net 137.164.27.161 0.765 ms
lax-hpr--svl-hpr-10ge.cenic.net 137.164.25.12 42.775 ms	sac-hpr--oak-hpr-10ge.cenic.net 137.164.25.17 3.152 ms	lax-hpr--svl-hpr-10ge.cenic.net 137.164.25.12 42.775 ms
riv-hpr--lax-hpr-10ge.cenic.net 137.164.25.5 14.534 ms	riv-hpr--sac-hpr-10ge.cenic.net 137.164.25.10 11.847 ms	riv-hpr--lax-hpr-10ge.cenic.net 137.164.25.5 14.534 ms
hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 14.162 ms	hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 12.957 ms	hpr-sdsc-sdsc2--riv-hpr-ge.cenic.net 137.164.27.54 14.162 ms
lightning.sdsc.edu 132.249.30.6 14.285 ms	lightning.sdsc.edu 132.249.30.6 12.986 ms	lightning.sdsc.edu 132.249.30.6 14.285 ms
node1.sdsc.edu 132.249.xxx.xxx 14.185 ms	node1.sdsc.edu 132.249.xxx.xxx 12.948 ms	node1.sdsc.edu 132.249.xxx.xxx 14.185 ms