Case Study

Case Study

Fiber Outage in Pakistan June 27^th 2005 to July 8^th 2005

The Pakistan’s sole under sea optical fiber link, called Southeast Asia, Middle East and Western Europe-3 (SEAMEWE-3), stopped working due to a fault from the 27^th June to the 8^th of July 2005. This disruption halted the global connectivity of almost 10 million internet users in the country ( http://www.jang.com.pk/thenews/jun2005-daily/28-06-2005/main/main3.htm).

The PingER team carried out an analysis of the level of global connectivity and performance of the country’s network infrastructure before, during and after the outage. The statistics and graphs are based entirely on the PingER data. This includes monitoring various nodes from SLAC and the World. Some measurements have also been made from within Pakistan. This outage also gave us an opportunity to study the overall network infrastructure in the country, with a few very interesting results.

The data obtained is a result of monitoring the following hosts in Pakistan. Unfortunately, we do not have meaningful data for any host located in Karachi, which would have further broadened the scope of our research, by covering the three big cities in the country.

Nick Name	Node Name	Service Provider	Name of Institute/Node	Location of Host

PK.NIIT.EDU.N1	www.niit.edu.pk	NTC	NUST Institute of Information Technology	Rawalpindi
PK.QAU.EDU.N1	www.qau.edu.pk	NTC	Quaid-e-Azam University	Islamabad
PK.UET.EDU.N1	www.uet.edu.pk	NTC	University of Engineering and Technology, Lahore	Lahore
PK.PIEAS.EDU.N1	www.pieas.edu.pk	NTC	Pakistan Institute of Engineering and Applied Sciences	Islamabad
PK.DSL.NET.SVR.N2	mbl.dsl.net.pk	Micronet Broadband	Micronet Broadband	Islamabad
PK.DSL.NET.N1	www.dsl.net.pk	Micronet Broadband	Micronet Broadband	Islamabad
PK.DSL.NET.GTWY.N1	lo-0-gw.dsl.net.pk	Micronet Broadband	Micronet Broadband	Islamabad
PK.LCWU.EDU.N1	www.lcwu.edu.pk	Brain Net	Lahore College for Women University	Lahore
ORG.WB.SDNPK.N1	www.wb.sdnpk.org	Habib Rafiq	Area Development Programme, Balochistan	Rawalpindi

Table 1: Nodes involved in the monitoring in Pakistan. The service provider data has been deduced by traceroute results to these nodes.

Effect of Fiber Outage June-July 2005

The effect of the fiber outage is studied in a fair amount of detail. Although backup satellite connectivity was provided, the quality of the backup link, as expected, was very poor. Moreover, it was perceived that the full connectivity would be restored in 3-4 days, but in actual fact the outage lasted for 12 days (http://www.jang.com.pk/thenews/jun2005-daily/30-06-2005/main/main13.htm).

Figure: 1: Median, 25 and 75 percentile of packet loss from SLAC to the various institutes in Pakistan seen in Table 1.

The sudden increase in ~~spikes~~ loss seen towards the end of June ~~suggests~~ inficates that during the fiber outage~~that during the outage~~, the reliability of the connectivity ~~had~~ downgraded to a huge extent. The packet loss from SLAC to Pakistan had increased from a few percent to over 10% and even beyond, before complete connectivity was lost. ~~ITU Standards suggest that the~~ Qquality levels for packet loss were set at ~~0-1% = good, 1-5% = acceptable, 5-12% = poor, and greater than 12% = bad. More recently, the levels have been refined to~~ 0-1% = good, 1-2.5% = acceptable, 2.5-5% = poor, 5%-12% = very poor, and greater than 12% = bad (http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html#loss). ~~This is due to the change in emphasis and the nature of applications now being used over the internet.~~ The median value in Fig ~~1 suggests v~~indicates that the outage period falls under the “bad” category, but data from Feb’04 and onwards shows that the overall connectivity of the Pakistani ~~various~~ universities seen from SLAC in the ~~from~~ US is poor. This, to some extent, also explains why real-time applications like streaming video and voice/video conferences over IP are of poor quality in the country.

Figuure 2: Shows the minimum RTT measured by PingER from SLAC to the various universities in Pakistan

As the main fiber connectivity link halted, there was a sudden jump in the Minimum Round Trip Times as well. ~~Due to excessive increase in the traffic over the~~ This is mainly due to the change from a land-line link to a geostationary satellite which imposes an RTT of ~ 600ms.~~Satellite link via “globalconnex”, located in Netherlands, the RTT values jumped a great deal due to re-routing of the traffic over satellite links.~~ This did not only happen with one or two nodes but every node located in Pakistan. Under normal circumstances, the traffic is routed via Singtel (Singapore).

Figure 3: The ping unreachability[1] to Pakistan Universities seen from SLAC.

Usually, nodes in Pakistan are fairly reachable. However, there was a sudden increase in “Ping Unreachability” to Pakistan due to the fault. The big blue spike (PK.NIIT.EDU.N2) is not due to problems in the fiber outage. This node was down for some due to other reasons. T ~~However, t~~owards the end of June, the percentage of unreachability soared as high as 18-19% for most nodes. During the outage, quite a few nodes in Pakistan were not reachable from the outside world.

RTT and Packet Loss from SLAC to NIIT

[c1]

Fig 4

These graphs show that the links have had huge amount of packet losses. At least once a month, the losses go as high as 20% (small circles) and occasionally, reaching the cent per cent mark (red circle). This goes to show the extreme unreliability of the link. The big yellow circle towards the end shows fiber outage in Pakistan that paralyzed the country’s Internet users for almost twelve days. However, even other than the outage, the country’s link has not been exemplary either. With a lot of variation in the packet loss and slight in the RTT values, serious efforts are required in order to improve the overall infrastructure. The sudden jump in Average RTT and Packet Loss values suggest that the satellite link was fairly congested. A possible reason could be that the call-center traffic in the country was being provided highest priority (http://www.jang.com.pk/thenews/jul2005-daily/02-07-2005/main/main18.htm).

Fig 5

The unreachability of Pings from SLAC increased by magnitude of 5-6 before the connection was totally disrupted, as shown by the red circle. The green circle is not due to the outage. Actually, this machine was down for some time. However, as we see PK.NIIT.EDU.N1, which is a monitored node at NUST Institute of Information Technology, there have been many times when the unreachability factor has gone fairly up. As we went deeper into the data, we noticed that the last week of May’05 had a very high unreachability factor, especially the 28th of May. Also, the overall network of the country experienced problems between the 21^st and 24^th of May. This unreliability was evident in the case of NIIT, which is being provided DSL service by the NTC. Upon further investigation, it was revealed that even the Quaid-e-Azam University, (www.qau.edu.pk) and PIEAS (www.pieas.edu.pk) also faced a lot of problems. Both these universities are also provided broadband service by NTC.[c2]

Comparison of the Links to Various Universities during the Internet Outage

We also carried out a comparison of the overall reliability (based on average Packet Loss) from SLAC to various universities in the country. NIIT was found to be the most disrupted link during this period, with an average packet loss of around 20%. Packet Loss of over 12% makes link fairly unusable (http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html#loss).

Fig 6

Longer term performance of NTC links seen from US

There were some very interesting observations that resulted from the case study. NTC (National Telecommunication Corporation), the official service provider of the government of Pakistan, has a relatively poor infrastructure as compared to some other ISPs.

Fig 7

In these graphs, it is very evident that the NIIT, QAU and PIEAS nodes get a fairly poor quality of service. Since one of the nodes at Micronet Broadband (mbl.dsl.net.pk) is also being monitored, the reaction to small outages is less as compared to NTC. Two possible conclusions could be that either there is some major problem with the NTC network, or the overall cabling in the country is not up to the mark. Whatever the reason may be, only the NTC can provide a final answer. It is also worth mentioning here that the average RTT values to the nodes provided service by the NTC is approximately 10-15 % higher than the nodes provided service by Micronet. This does not include NIIT, where the RTT values are highly inconsistent, with occasional spikes reaching 900 ms or even beyond, with almost more than half of the delay coming in the NIIT-NTC link. The fluttering route problem has been brought to the notice of NTC officials more than once; emails dated March 24^th, April 5^th, April 15^th, and July 15^th 2005; but unfortunately, no action has been taken.

Fig 8

This how the minimum Round Trip Times from SLAC to various institutions in Pakistan have looked since we began monitoring various nodes in the country from December. The RTT value from the West Coast to anywhere in Pakistan should be between the 250 ms and 300 ms mark. NIIT has always been in the post 400 ms region. As mentioned, this has many a times been pointed out to the NTC, but to no avail. We see a sudden surge in the spikes towards the end for all the nodes. This happened during fiber outage. Too bad we lost some data from NIIT during these days due to a power failure.

Longer term performance of Links within Pakistan

This outage also gave us an opportunity to explore the connectivity of various Pakistan’s links from within Pakistan. They primarily fall into two categories:

1) Avg RTT value in range of 100ms; A possible explanation could be the change in ISPs, for instance average RTT from NIIT (using NTC) to Quaid-e-Azam University, PIEAS and UET, all using the services of NTC fall under this category.

2) Avg RTT value close to 300ms; Whenever, the ISP changes, we notice a huge jump in the Avg RTT values, for instance from NIIT (using NTC) to Micronet Broadband and Area Development Programme using Habib Rafiq fall under this category.

Fig 9

When we look at the graph of average RTT from NIIT to the rest of Pakistan, we again had some strange observations. The internal link of the country developed serious problems from April this year. It is also not good that the nodes lying within Pakistan have an average RTT of over 100ms. Don’t be surprised by the awesome performance between the yellow and blue line. They are located in the same lab. This stresses the need for improvement in the internal infrastructure of the country. I would also like to quote the example of US here; the West Coast (SLAC via ESNet) and the East coast (MIT via Abiliene) of US are geographically around 3000 miles apart from each other have an average RTT value of around 84-86 ms.

Fig 10

This graph shows the average RTT values from NIIT to the rest of the world. All values are based on averages of various nodes in different countries. For a more detailed overview of how these values are obtained, kindly refer to the PingER project. You can also look at the PingER Reports page to see how this data is obtained. As we observe, April was the time when the country’s link actually began to degrade. Towards the end of the graph, we notice the effects of the outage, which is evident towards the end of June’05. The red circle shows the variation in the country’s local link. This is one area where a lot of optimization has to take place. This can only happen if all the ISPs, especially the broadband service providers and the government collectively want to solve this problem.

The RTT values to different countries should be directly proportional to the distance between them. With India, the average RTT is greater than Japan. This is because from Pakistan, the traffic goes to USA, Japan and Hong Kong before reaching India (example Traceroute from http://monalisa.niit.edu.pk to http://dipp.nic.in).

Fig 11

Another common problem that Pakistan faces due to the poor internal infrastructure is the location of some of the local servers. These servers are hosted in other countries, which means that if anyone in Pakistan wants to access data, expensive International Bandwidth is consumed.

Different parameters; Average RTT, Minimum RTT, Throughput; over the last 60 days from SLAC to Pakistan

Fig 12

This graph shows the average RTT values over the last 60 days from SLAC to various universities in Pakistan. The spikes are a result of the recent internet outage in the country.

Fig 13

This graph shows the average packet loss percentages over the last 60 days from SLAC to various universities in Pakistan. The spikes towards the end are a result of the recent internet outage in the country.

Fig 14

This is the 60day throughput graph from SLAC to various institutions in Pakistan. There was a sharp decrease in some nodes due to the outage.

Fig 15

This graph shows the percentiles for values of throughput shown above. The yellow line, which is the Standard deviation, is very high, which speaks of the inconsistency in the throughput obtained. Notice the few small dots between the 27^th Jun and 7^th July.

Fig 16

This graph is for the last 60 days from SLAC. It shows that among the Pakistani institutes, NIIT and Quaid-e-Azam University have greater than average packet losses.

Comparison between the Major Broadband Service Providers in Pakistan

Based on out findings, we prepared a brief comparison between the services of the major service providers in Pakistan. Having data from four ISPs, we narrowed our research to only two viz NTC and Micronet broadband due to availability of more data from the nodes under them. Also, these two providers are the service providers to most of the universities in Pakistan.

Fig 17

As evident, the data for the last 60 days shows that NTC has had way higher packet loss as compared to Micronet. These are currently the major service providers to most of the universities that we are monitoring. Also, it is worth considering for PERN that if they are availing the facility from NTC, is the best and the most reliable service in Pakistan? Just to mention that out of the 9 institutes that we were monitoring, NTC was serving 4, Micronet 3 and Habib Rafiq and BrainNet one each. The details are given above in Table 1.

Fiber Outage in March 05

While digging up the data, we also studied the impacts of the internet outage in March 05 (http://www.jang.com.pk/thenews/mar2005-daily/26-03-2005/metro/i8.htm ), which occurred due to the disconnection of the power cable of Pakistan Telecommunication Corporation (PTCL) by the "excavators of Karachi Water and Sewerage Board hired by SITE Limited Factory".

Fig 18

The yellow circle highlights the Three Hour outage (http://www.jang.com.pk/thenews/mar2005-daily/26-03-2005/metro/i8.htm) but the red circles are impossible to explain. This goes to show, that even when the cable is fine, the internet quality is fairly low. Just to highlight, that any node having a regular packet loss of greater than 4% on a regular basis shows that the link quality is poor. For further information on packet loss comparisons, check out the PingER Tutorial. We have spikes going as high as 30-40% during some parts of any given day.

Fig 19

The MOST SHOCKING of them all; this is happening within NIIT, part of the same lab. I can not believe that the PTCL outage was felt within NIIT too!!

Fig 20

This is how the outage should have looked like; a three hour cut and then smooth. There are occasional spikes but not as many as we see below.

Fig 21

Brazil to NIIT, during the March outage.

Fig 22

I can understand that there was an outage, but what’s most shocking is the performance before and after the outage.

Fig 23

Since the data was very surprising due to the nature of these spikes, we dug deeper and confirmed that NTC feels the maximum impact of all these outages. Even when there are no outages, NTC customers feel the outages every now and again.

Fig 24

NIIT, to whom the service is being provided by NTC (National Telecommunication Corporation); the official IT&T service provider to the Government of Pakistan and a subsidiary of PTCL, seemed to have been hit worst by the outage. Seems like it took them quite a lot of days to recover.

Prepared by: Les Cottrell and Aziz Allaudin Rehmatullah

[1] By looking at the ping data to identify 30 minute periods when no ping responses were received from a given host, one can identify when the host was down. Using this information one can calculate ping unreachability= (# periods with Node down / total number of periods).