Problems with link to/from BNLLes Cottrell. Page created: August 16, 2005Central Computer Access | Computer Networking | Network Group | More case studies |
|
Qwest advises that the local carrier has confirmed there is a fiber cut which caused the OC48 outage to Brookhaven National Laboratory. The cut has not yet been pinpointed and there is no estimated uptime yet. Connectivity to BNL was lost at 15:44PT. Steve Lowe ESnet - The Energy Science Network ========================================================================Details and tracking of the problem can be found by doing a
finger 13716@ticket.es.netSoon after that (7:45pm) we received a network anomalous event email from the BNL IEPM-BW monitoring host at iepmbw.bnl.gov. This email identified events seen by the Plateau Algorithm in: iperf data to Caltech, CERN and SLAC; thrulay data to ANL, Daresbury (nr. Liverpool UK), Indiana, and the University of Florida; pathchirp to SDSC, SLAC, and University of Florida. Further alerts were sent from iepmbw.bnl.gov at 9:45pm, 10:45pm, 11:45pm, 12:45pm, 1:45am next day, and 2:23am. Alerts were also sent from the IEPM-BW monitoring host at CERN pcgiga.cern.ch identifying events in the iperf measurements to BNL.
If one looks in detail at the pathchirp data, it is clear the effect of the event started between 18:41 and 18:57 8/15/05 and ended between 3:42 and 3:57pm 8/16/05. Also looking at several metrics (RTT, pathchirp, iperf, multi-stream iperf (miperf) and thrulay) it can be seen that they all change at the time of the event.
The traceroute visualization also clearly shows the onset of instability after 18:00 hours.
Looking at the PingER data of losses from PingER monitoring sites to BNL there is also evidence of increased packet loss from many monitoring sites in Japan, UK, Germany, Hungary, Canada, US (both Internet2 and ESnet) - typically from no loss to 1 or 2%. At the same time it appears two Italian sites lost connectivity to BNL for at least 30 minutes.