Poor Inbound Network Performance Found Using NDT
Les Cottrell, Antonio Cessariccu, Yee Ting Li, August 10, 2007
Reported Problem
While testing NDT to an
NDT server at SLAC ((min/avg/max/stdev ping RTT from pinger = 0.109/0.327/0.601/0.176 ms)
from clients on the same LAN at SLAC, we noticed
much asymmetry between Outbound (client-to-server) and in-bound TCP
throughput performance.
Typical values were 94Mb/s outbound and 375Mb/s inbound. NDT also provided
a warning "Old Duplex mismatch condition detected: [S2C]: Excessive packet
queuing detected."
The more detailed
Statistics showed many retransmissions,
duplicate ACKs and excessive packet queueing. Since the NDT server was on the
LAN the RTT was small (4 msec. measured by NDT). This client (Atlas)
was running Windows XP, version 5.1 with a 1000Mb/s NIC. The NIC was
configured for 100Mb/s fixed. The NIC driver was a Broadcom NetXtreme
57xx Gigabit controller and the driver said it knew of no problems.
The NIC was connected to a Cisco 5000 port (1/12 F-100 Fixed).
Turning on or off flow control in the driver had no effect.
Tests to the NDT server at
Stanford (min/avg/max/stdev ping RTT from pinger = 0.444/0.869/1.181/0.194 ms)
on the other hand gave ~93Mb/s outbound and 95 Mb/s inbound and
no mention of Duplex mismatch. The
statistics showed no loss, no
retransmits, no duplicate ACKs.
Further Tests
A nearby desktop (net-desk2) on the same Cisco 5000 switch (port 3/15
F-100 Auto, WSX5239)
with an Intel NIC running Linux achieved ~90Mb/s out and in-bound.
We moved net-desk2 to a WSX5224 port on the Cisco router and it still performed
well (see the
statistics).
Osiras runing on the same Cisco 5000 switch port (F-100 auto, WSX5224)
with an Intel NIC running Linux achieved ~90Mb/s out and ~40Mb/s inbound.
However the
statistics reported
the Duplex mismatch problem, many retransmissions and duplicate ACKs.
Another desktop (scylla) running Linux with a Broadcom NIC on the same
Cisco (port 8/8 F-100 Auto) achieved 94Mbps/42Mbps and complained "Old
Duplex-Mismatch condition detected". Much better than atlas but still
very asymmentric (like Osiras). We were unable to find the NIC card/driver type.
Host iepm-desk running Windows with a Broadcom Nextreme 57xx Gigabit controller but
connected at 10Mbits/s got no warning duplex, however outbound and inbound
were 6.86Mb/s and 175.2kbps (very asymmetric) and the
statistics reported "Normal duplex operation",
with 195 retransmits and 297 duplicate ACKs, there was excessive packet
queuing inbound.
We moved atlas to a Cisco 3750 with 10/100Mbps ports.
Both the atlas NIC and the Cisco port
were configured for autonegotiation. The NDT test was repeated.
We achieved similar performance to before (94Mb/s outbound, 384kbps inbound)
We moved atlas to a Cisco 6900 router switch/router with a
SUP720. Both the switch and NIC were
configured for 100Mb/s autonegotiation with the switch port limited to
100Mb/s. The NDT test was repeated with similar results. We then
removed the 100Mb/s limit and the ports autonegotiated for
1000Mb/s. This time the NDT test gave 585Mb/s and 431 kb/s but no
warnings.
Conclusions
A table of some of the statistics is available
here.
The problem does not appear to be related to the Cisco switch or the particular blade
(it appeared on 3 different Cisco switches and many different baldes, it also did
not occur with two different blades wiothe the same host).
There appear to be multiple related problems. With most hosts we are seeing
losses and reries. Linux may do better responding to thse. It is possible the
Broadcom NIC does not perfom as well as the Intel.
The
problem also does not appear on the Intel NIC. It appears it may be with the
Broadcom NIC/driver.