Poor Inbound Network Performance Found Using NDT

Les Cottrell, Antonio Cessariccu, Yee Ting Li, August 10, 2007

Reported Problem

While testing NDT to an NDT server at SLAC ((min/avg/max/stdev ping RTT from pinger = 0.109/0.327/0.601/0.176 ms) from clients on the same LAN at SLAC, we noticed much asymmetry between Outbound (client-to-server) and in-bound TCP throughput performance. Typical values were 94Mb/s outbound and 375Mb/s inbound. NDT also provided a warning "Old Duplex mismatch condition detected: [S2C]: Excessive packet queuing detected." The more detailed Statistics showed many retransmissions, duplicate ACKs and excessive packet queueing. Since the NDT server was on the LAN the RTT was small (4 msec. measured by NDT). This client (Atlas) was running Windows XP, version 5.1 with a 1000Mb/s NIC. The NIC was configured for 100Mb/s fixed. The NIC driver was a Broadcom NetXtreme 57xx Gigabit controller and the driver said it knew of no problems. The NIC was connected to a Cisco 5000 port (1/12 F-100 Fixed). Turning on or off flow control in the driver had no effect. Tests to the NDT server at Stanford (min/avg/max/stdev ping RTT from pinger = 0.444/0.869/1.181/0.194 ms) on the other hand gave ~93Mb/s outbound and 95 Mb/s inbound and no mention of Duplex mismatch. The statistics showed no loss, no retransmits, no duplicate ACKs.

Further Tests

A nearby desktop (net-desk2) on the same Cisco 5000 switch (port 3/15 F-100 Auto, WSX5239) with an Intel NIC running Linux achieved ~90Mb/s out and in-bound.

We moved net-desk2 to a WSX5224 port on the Cisco router and it still performed well (see the statistics).

Osiras runing on the same Cisco 5000 switch port (F-100 auto, WSX5224) with an Intel NIC running Linux achieved ~90Mb/s out and ~40Mb/s inbound. However the statistics reported the Duplex mismatch problem, many retransmissions and duplicate ACKs.

Another desktop (scylla) running Linux with a Broadcom NIC on the same Cisco (port 8/8 F-100 Auto) achieved 94Mbps/42Mbps and complained "Old Duplex-Mismatch condition detected". Much better than atlas but still very asymmentric (like Osiras). We were unable to find the NIC card/driver type.

Host iepm-desk running Windows with a Broadcom Nextreme 57xx Gigabit controller but connected at 10Mbits/s got no warning duplex, however outbound and inbound were 6.86Mb/s and 175.2kbps (very asymmetric) and the statistics reported "Normal duplex operation", with 195 retransmits and 297 duplicate ACKs, there was excessive packet queuing inbound.

We moved atlas to a Cisco 3750 with 10/100Mbps ports. Both the atlas NIC and the Cisco port were configured for autonegotiation. The NDT test was repeated. We achieved similar performance to before (94Mb/s outbound, 384kbps inbound)

We moved atlas to a Cisco 6900 router switch/router with a SUP720. Both the switch and NIC were configured for 100Mb/s autonegotiation with the switch port limited to 100Mb/s. The NDT test was repeated with similar results. We then removed the 100Mb/s limit and the ports autonegotiated for 1000Mb/s. This time the NDT test gave 585Mb/s and 431 kb/s but no warnings.

Conclusions

A table of some of the statistics is available here. The problem does not appear to be related to the Cisco switch or the particular blade (it appeared on 3 different Cisco switches and many different baldes, it also did not occur with two different blades wiothe the same host). There appear to be multiple related problems. With most hosts we are seeing losses and reries. Linux may do better responding to thse. It is possible the Broadcom NIC does not perfom as well as the Intel. The problem also does not appear on the Intel NIC. It appears it may be with the Broadcom NIC/driver.