PingER Measurement Pathology Examples

Page created May 23, 1999 by: Les Cottrell. Last Update July 16 2002.

Tutorial | PingER Help | PingER Tools | PingER Summary Reports| PingER Detail Reports

Introduction

Using ping to measure LAN performance, we have encountered a pathology when pinging from doris (Linux) to atreides (Windows NT). The pathology manifested itself in the plot of ping RTT versus ping sequence number as an unusual regularity in the sequence number separation of the pings with RTT > 10 msec., see the highlighted link for Linux to WNT (atreides) below. The effect was reproducible.

In this page we provide an index to some interesting plots created when tracking down the pathology. In the case of Linux sending the ping requests the requesting host was doris.slac.stanford.edu which was running Redhat Linux 5.2. The Windows NT (WNT) hosts were running Windows NT 4 with Service Pack (SP) 4. Some of the PC hosts could dual boot and so may show up as running WNT or Linux. The Sun host (gryphon) was running Solaris 5.6. The names of the hosts involved are given in parentheses, are all in the .slac.stanford.edu domain, and all, including doris, are on the same subnet (PUB6). Also all hosts are on 10 Mbps shared Ethernet hubs. Unless otherwise noted the ping application was the standard version delivered with the operating system. The pings were sent once a second, had a timeout of 20 seconds, and contained 100 bytes. All the graphs show the ping sequence number along the x-axis (unless otherwise noted) and the ping round trip time (RTT) in milliseconds along the y axis.

Pinging involving PCs

Ping from Linux to WNT

WNT to Linux

Linux to Linux

Ping between Linux and Sun (gryphon)

Ping from Linux to self

Network connections

HostSwitch portIP addressMAC addressOS
atlas/dhcp-24-179CGB3: 3/4134.79.24.17900-10-04-f5-f5-53WNT 4/SP4
atreidesCGB3: 3/16134.79.24.1200-c0-4f-76-18-36WNT 4/SP4
dorisCGB3: 3/7134.79.24.12200-c04f-98-6b-f7Linux Redhat 5.2
ecclesCGB3: 3/16134.79.24.9500-c0-4f-a3-8d-04Linux Redhat 5.2
gryphonCGB3: 3/5134.79.25.13008-00-20-22-ed-4bSolaris 5.6
hectorCGB3: 3/2134.79.24.9700-60-97-cc-50-26WNT 4/SP 4
odinCGB3: 3/6134.79.24.4600-c0-4f-a3-b8-51Linux Redhat 5.2
procrustesCGB3: 3/8134.79.24.8400-c0-4f-b9-6a-65WNT 4/SP4
yemintCGB3: 3/7134.79..24.8600-c0-4f-c2-77-cdWNT 4/SP4

Possible resolution

On Tuesday July 15, '02 I received the following email from Stephan Bohacek [bohacek@math.usc.edu]:
Hi Les, We have been doing extensive high frequency ping measurements. We had noticed a similar effect you noted in http://www.slac.stanford.edu/comp/net/wan-mon/pathology-eg.html. However, we have since determined that this is due to the operating system stalling. For example, our windows machines stalls every 300 ms for 16 ms. Thus, a packet will be delayed anywhere from 0 to 16 ms depending how far along the stall is when the packet arrives. Since the stalling is periodic and the pings are sent periodically, the delay pattern can be quite complex (as you noted). We have fixed the problem by using real-time operating systems (RTAI a real time linux). We are writing up some details. I'll send them on when they are complete.
[ Feedback ]