Evaluation of Advanced TCP stacks on Fast Long-Distance production Networks
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the | |
BaBar/SCS collaboration | |
November 2003 | |
Test new advanced TCP stacks, see how they perform on short and long-distance real production WAN links | |
Compare & contrast: ease of configuration, throughput, convergence, fairness, stability etc. | |
For different RTTs, windows, txqueuelen | |
Recommend “optimum” stacks for data intensive science (BaBar) transfers using bbftp, bbcp, GridFTP | |
Validate simulator & emulator findings & provide feedback |
TCP only | ||
No Rate based transport protocols (e.g. SABUL, UDT, RBUDP) at the moment | ||
No iSCSI or FC over IP | ||
Sender mods only, HENP model is few big senders, lots of smaller receivers | ||
Simplifies deployment, only a few hosts at a few sending sites | ||
No DRS | ||
Runs on production nets | ||
No router mods (XCP/ECN), no jumbos, |
Linux 2.4 New Reno with SACK: single and parallel streams (P-TCP) | |
Scalable TCP (S-TCP) | |
Fast TCP | |
HighSpeed TCP (HS-TCP) | |
HighSpeed TCP Low Priority (HSTCP-LP) | |
Binary Increase Control TCP (Bic-TCP) | |
Hamilton TCP (H-TCP) | |
Low performance on fast long distance paths | ||
AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) |
TCP Reno with 16 streams | ||
Parallel streams heavily used in HENP & elsewhere to achieve needed performance, so it is today’s de facto baseline | ||
However, hard to optimize both the window size AND number of streams since optimal values can vary due to network capacity, routes or utilization changes |
20 minute tests, long enough to see stable patterns | |
Iperf reports incremental and cumulative throughputs at 5 second intervals | |
Ping interval about 100ms | |
At sender use: 1 for iperf/TCP, 2nd for cross-traffic (UDP or TCP), 3rd for ping | |
At receiver: use 1 machine for ping (echo) and TCP, 2nd for cross-traffic |
3 main network paths | ||
Short distance: SLAC-Caltech (RTT~10ms) | ||
Middle distance: UFL and DataTAG Chicago (RTT~70ms) | ||
Long distance: CERN and University of Manchester (RTT ~ 170ms) | ||
Tests during nights and weekends to avoid unacceptable impacts on production traffic |
Set large maximum windows (typically 32MB) on all hosts | ||
Used 3 different windows with iperf: | ||
Small window size, factor 2-4 below optimal | ||
Roughly optimal window size (~BDP) | ||
Oversized window |
Only P-TCP appears to dramatically affect the RTT | ||
E.g. increases by RTT by 200ms (factor 20 for short distances) |
Important to understand how fair a protocol is | |
For one protocol competing against the same protocol (intra-protocol) we define the fairness for a single bottleneck as: | |
All protocols have good intra-protocol Fairness (F>0.98) | |
Except HS-TCP (F<0.94) when the window size > optimal |
For inter-protocol fairness we introduce the asymmetry between the two throughputs: | ||
Where x1 and x2 are the throughput averages of TCP stack 1 competing with TCP stack 2 |
Advanced stacks behave like TCP-Reno single stream on short distances for up to Gbits/s paths, especially if window size limited | |
TCP Reno single stream has low performance and is unstable on long distances | |
P-TCP is very aggressive and impacts the RTT badly | |
New TCP stacks work well with a single stream, most are fair and stable | |
Ready to put into use with production applications |
TCP Stacks Evaluation: | ||
www-iepm.slac.stanford.edu/bw/tcp-eval/ |