Evaluation of Advanced TCP stacks on Fast Long-Distance production Networks
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the | |
BaBar/SCS meeting | |
November 11, 2003 | |
www.slac.stanford.edu/grp/scs/net/talk03/babar-long-nov03.html | |
Test new advanced TCP stacks, see how they perform on short and long-distance real production WAN links | |
Compare & contrast: ease of configuration, throughput, convergence, fairness, stability etc. | |
For different RTTs, windows, txqueuelen | |
Recommend “optimum” stacks for data intensive science (BaBar) transfers using bbftp, bbcp, GridFTP | |
Validate simulator & emulator findings & provide feedback |
TCP only | ||
No Rate based transport protocols (e.g. SABUL, UDT, RBUDP) at the moment | ||
No iSCSI or FC over IP | ||
Sender mods only, HENP model is few big senders, lots of smaller receivers | ||
Simplifies deployment, only a few hosts at a few sending sites | ||
No DRS | ||
Runs on production nets | ||
No router mods (XCP/ECN), no jumbos, |
Linux 2.4 New Reno with SACK: single and parallel streams (P-TCP) | |
Scalable TCP (S-TCP) | |
Fast TCP | |
HighSpeed TCP (HS-TCP) | |
HighSpeed TCP Low Priority (HSTCP-LP) | |
Binary Increase Control TCP (Bic-TCP) | |
Hamilton TCP (H-TCP) | |
Low performance on fast long distance paths | ||
AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) |
TCP Reno with 16 streams | ||
Parallel streams heavily used in HENP & elsewhere to achieve needed performance, so it is today’s de facto baseline | ||
However, hard to optimize both the window size AND number of streams since optimal values can vary due to network capacity, routes or utilization changes |
Uses exponential increase everywhere (in slow start and congestion avoidance) | |
Multiplicative decrease factor b = 0.125 | |
Introduced by Tom Kelly of Cambridge |
Based on TCP Vegas | |
Uses both queuing delay and packet losses as congestion measures | |
Developed at Caltech by Steven Low and collaborators | |
Behaves like Reno for small values of cwnd | |
Above a chosen value of cwnd (default 38) a more aggressive function is used | |
Uses a table to indicate by how much to increase cwnd when an ACK is received | |
Introduced by Sally Floyd |
Mixture of HS-TCP with TCP-LP (Low Priority) | |
Backs off early in face of congestion by looking at RTT | |
Idea is to give scavengers service without router modifications | |
From Rice University |
Combine: | ||
An additive increase used for large cwnd | ||
A binary search increase used for small cwnd | ||
Developed Injong Rhee at NC State University |
Similar to HS-TCP in switching to aggressive mode after threshold | |
Uses an heterogeneous AIMD algorithm | |
Developed at Hamilton U Ireland | |
20 minute tests, long enough to see stable patterns | |
Iperf reports incremental and cumulative throughputs at 5 second intervals | |
Ping interval about 100ms | |
At sender use: 1 for iperf/TCP, 2nd for cross-traffic (UDP or TCP), 3rd for ping | |
At receiver: use 1 machine for ping (echo) and TCP, 2nd for cross-traffic |
3 main network paths | ||
Short distance: SLAC-Caltech (RTT~10ms) | ||
Middle distance: UFL and DataTAG Chicago (RTT~70ms) | ||
Long distance: CERN and University of Manchester (RTT ~ 170ms) | ||
Tests during nights and weekends to avoid unacceptable impacts on production traffic |
Set large maximum windows (typically 32MB) on all hosts | ||
Used 3 different windows with iperf: | ||
Small window size, factor 2-4 below optimal | ||
Roughly optimal window size (~BDP) | ||
Oversized window |
Only P-TCP appears to dramatically affect the RTT | ||
E.g. increases by RTT by 200ms (factor 20 for short distances) |
Regulates the size of the queue between the IP layer and the Ethernet layer | |
May increase the throughput if we find optimal values | |
But may increase duplicate ACKs (Y. T Li) |
Definition: standard deviation normalized by the average throughput | ||
At short RTT (10ms) stability is usually good (<=12%) | ||
At medium RTT (70ms) P-TCP, Scalable & Bic-TCP and appear more stable than the other protocols |
UDP does not back off in face of congestion, it has a “stiff” behavior | ||
We modified iperf to allow it to create UDP traffic with a sinusoidal time behavior, following an idea from Tom Hacker | ||
See how TCP responds to varying cross-traffic | ||
Used 2 periods of 30 and 60 seconds and amplitude varying from 20 to 80 Mbps | ||
Sent from 2nd sending host to 2nd receiving host while sending TCP from 1st sending host to 1st receiving host | ||
As long as the window size was large enough all protocols converged quickly and maintain a roughly constant aggregate throughput | ||
Especially for P-TCP & Bic-TCP |
Stability better at short distances | |
P-TCP & Bic more stable |
Important to understand how fair a protocol is | |
For one protocol competing against the same protocol (intra-protocol) we define the fairness for a single bottleneck as: | |
All protocols have good intra-protocol Fairness (F>0.98) | |
Except HS-TCP (F<0.94) when the window size > optimal |
Most have good intra-protocol fairness (diagonal elements), except HS-TCP | |
Inter protocol Bic & H appear more fair against others | |
Worst fairness are HSTCP-LP, P-TCP, S-TCP, Fast, HSTCP-LP | |
But cannot tell who is aggressive and who is timid | |
For inter-protocol fairness we introduce the asymmetry between the two throughputs: | ||
Where x1 and x2 are the throughput averages of TCP stack 1 competing with TCP stack 2 |
Cause queuing on reverse path by using P-TCP 16 streams | |
ACKs are lost or come back in bursts (compressed ACKs) | |
Fast TCP throughput is 4 to 8 times less than the other TCPs. |
Finish measurements to Manchester/CERN | |
More analysis | |
Work with Caltech to correlate with simulation | |
Compare with other people’s measurements | |
Test Westwood+ | |
Tests with different RTTs on the same link | |
Try on 10Gbps links | |
More tests with multiple streams | |
Look at performance of rate based protocols | |
Use with production applications | |
Advanced stacks behave like TCP-Reno single stream on short distances for up to Gbits/s paths, especially if window size limited | |
TCP Reno single stream has low performance and is unstable on long distances | |
P-TCP is very aggressive and impacts the RTT badly | |
HSTCP-LP is too gentle, this can be important for providing scavenger service without router modifications. By design it backs off quickly, otherwise performs well | |
Fast TCP is very handicapped by reverse traffic | |
S-TCP is very aggressive on long distances | |
HS-TCP is very gentle, like H-TCP has lower throughput than other protocols | |
Bic-TCP performs very well in almost all cases | |
TCP Stacks Evaluation: | ||
www-iepm.slac.stanford.edu/bw/tcp-eval/ |
With optimal window all stacks within ~20% of one another, except Reno 1 stream on medium and long distances |