Evaluation of Advanced TCP stacks on Fast Long-Distance production Networks

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

BaBar/SCS meeting

November 11, 2003

www.slac.stanford.edu/grp/scs/net/talk03/babar-long-nov03.html

Project goals

Test new advanced TCP stacks, see how they perform on short and long-distance real production WAN links

Compare & contrast: ease of configuration, throughput, convergence, fairness, stability etc.

For different RTTs, windows, txqueuelen

Recommend “optimum” stacks for data intensive science (BaBar) transfers using bbftp, bbcp, GridFTP

Validate simulator & emulator findings & provide feedback

Protocol selection

TCP only

No Rate based transport protocols (e.g. SABUL, UDT, RBUDP) at the moment

No iSCSI or FC over IP

Sender mods only, HENP model is few big senders, lots of smaller receivers

Simplifies deployment, only a few hosts at a few sending sites

No DRS

Runs on production nets

No router mods (XCP/ECN), no jumbos,

Protocols Evaluated

Linux 2.4 New Reno with SACK: single and parallel streams (P-TCP)

Scalable TCP (S-TCP)

Fast TCP

HighSpeed TCP (HS-TCP)

HighSpeed TCP Low Priority (HSTCP-LP)

Binary Increase Control TCP (Bic-TCP)

Hamilton TCP (H-TCP)

Reno single stream

Low performance on fast long distance paths

AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion)

P-TCP

TCP Reno with 16 streams

Parallel streams heavily used in HENP & elsewhere to achieve needed performance, so it is today’s de facto baseline

However, hard to optimize both the window size AND number of streams since optimal values can vary due to network capacity, routes or utilization changes

S-TCP

Uses exponential increase everywhere (in slow start and congestion avoidance)

Multiplicative decrease factor b = 0.125

Introduced by Tom Kelly of Cambridge

Fast TCP

Based on TCP Vegas

Uses both queuing delay and packet losses as congestion measures

Developed at Caltech by Steven Low and collaborators

HS-TCP

Behaves like Reno for small values of cwnd

Above a chosen value of cwnd (default 38) a more aggressive function is used

Uses a table to indicate by how much to increase cwnd when an ACK is received

Introduced by Sally Floyd

HSTCP-LP

Mixture of HS-TCP with TCP-LP (Low Priority)

Backs off early in face of congestion by looking at RTT

Idea is to give scavengers service without router modifications

From Rice University

Bic-TCP

Combine:

An additive increase used for large cwnd

A binary search increase used for small cwnd

Developed Injong Rhee at NC State University

H-TCP

Similar to HS-TCP in switching to aggressive mode after threshold

Uses an heterogeneous AIMD algorithm

Developed at Hamilton U Ireland

Measurements

20 minute tests, long enough to see stable patterns

Iperf reports incremental and cumulative throughputs at 5 second intervals

Ping interval about 100ms

At sender use: 1 for iperf/TCP, 2nd for cross-traffic (UDP or TCP), 3^rd for ping

At receiver: use 1 machine for ping (echo) and TCP, 2^nd for cross-traffic

Networks

3 main network paths

Short distance: SLAC-Caltech (RTT~10ms)

Middle distance: UFL and DataTAG Chicago (RTT~70ms)

Long distance: CERN and University of Manchester (RTT ~ 170ms)

Tests during nights and weekends to avoid unacceptable impacts on production traffic

Windows

Set large maximum windows (typically 32MB) on all hosts

Used 3 different windows with iperf:

Small window size, factor 2-4 below optimal

Roughly optimal window size (~BDP)

Oversized window

RTT

Only P-TCP appears to dramatically affect the RTT

E.g. increases by RTT by 200ms (factor 20 for short distances)

txqueuelen

Regulates the size of the queue between the IP layer and the Ethernet layer

May increase the throughput if we find optimal values

But may increase duplicate ACKs (Y. T Li)

Throughput (Mbps)

Throughput

Stability

Definition: standard deviation normalized by the average throughput

At short RTT (10ms) stability is usually good (<=12%)

At medium RTT (70ms) P-TCP, Scalable & Bic-TCP and appear more stable than the other protocols

Sinusoidal UDP

UDP does not back off in face of congestion, it has a “stiff” behavior

We modified iperf to allow it to create UDP traffic with a sinusoidal time behavior, following an idea from Tom Hacker

See how TCP responds to varying cross-traffic

Used 2 periods of 30 and 60 seconds and amplitude varying from 20 to 80 Mbps

Sent from 2^nd sending host to 2^nd receiving host while sending TCP from 1^st sending host to 1^st receiving host

As long as the window size was large enough all protocols converged quickly and maintain a roughly constant aggregate throughput

Especially for P-TCP & Bic-TCP

TCP Convergence against UDP

Stability better at short distances

P-TCP & Bic more stable

Slide 23

Cross TCP Traffic

Important to understand how fair a protocol is

For one protocol competing against the same protocol (intra-protocol) we define the fairness for a single bottleneck as:

All protocols have good intra-protocol Fairness (F>0.98)

Except HS-TCP (F<0.94) when the window size > optimal

Fairness (F)

Most have good intra-protocol fairness (diagonal elements), except HS-TCP

Inter protocol Bic & H appear more fair against others

Worst fairness are HSTCP-LP, P-TCP, S-TCP, Fast, HSTCP-LP

But cannot tell who is aggressive and who is timid

Inter protocol Fairness

For inter-protocol fairness we introduce the asymmetry between the two throughputs:

Where x₁ and x₂ are the throughput averages of TCP stack 1 competing with TCP stack 2

Inter Fairness – UFl (A)

Reverse Traffic

Cause queuing on reverse path by using P-TCP 16 streams

ACKs are lost or come back in bursts (compressed ACKs)

Fast TCP throughput is 4 to 8 times less than the other TCPs.

Future work

Finish measurements to Manchester/CERN

More analysis

Work with Caltech to correlate with simulation

Compare with other people’s measurements

Test Westwood+

Tests with different RTTs on the same link

Try on 10Gbps links

More tests with multiple streams

Look at performance of rate based protocols

Use with production applications

Preliminary Conclusions

Advanced stacks behave like TCP-Reno single stream on short distances for up to Gbits/s paths, especially if window size limited

TCP Reno single stream has low performance and is unstable on long distances

P-TCP is very aggressive and impacts the RTT badly

HSTCP-LP is too gentle, this can be important for providing scavenger service without router modifications. By design it backs off quickly, otherwise performs well

Fast TCP is very handicapped by reverse traffic

S-TCP is very aggressive on long distances

HS-TCP is very gentle, like H-TCP has lower throughput than other protocols

Bic-TCP performs very well in almost all cases

More Information

TCP Stacks Evaluation:

www-iepm.slac.stanford.edu/bw/tcp-eval/

Extra Slides

Throughput

With optimal window all stacks within ~20% of one another, except Reno 1 stream on medium and long distances

Inter Fair Caltech

Stability


	Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the
	BaBar/SCS meeting
	November 11, 2003
	www.slac.stanford.edu/grp/scs/net/talk03/babar-long-nov03.html


	Test new advanced TCP stacks, see how they perform on short and long-distance real production WAN links
	Compare & contrast: ease of configuration, throughput, convergence, fairness, stability etc.
	For different RTTs, windows, txqueuelen
	Recommend “optimum” stacks for data intensive science (BaBar) transfers using bbftp, bbcp, GridFTP
	Validate simulator & emulator findings & provide feedback


	TCP only
		No Rate based transport protocols (e.g. SABUL, UDT, RBUDP) at the moment
		No iSCSI or FC over IP
	Sender mods only, HENP model is few big senders, lots of smaller receivers
		Simplifies deployment, only a few hosts at a few sending sites
		No DRS
	Runs on production nets
		No router mods (XCP/ECN), no jumbos,


	Linux 2.4 New Reno with SACK: single and parallel streams (P-TCP)
	Scalable TCP (S-TCP)
	Fast TCP
	HighSpeed TCP (HS-TCP)
	HighSpeed TCP Low Priority (HSTCP-LP)
	Binary Increase Control TCP (Bic-TCP)
	Hamilton TCP (H-TCP)


	Low performance on fast long distance paths
		AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion)


	TCP Reno with 16 streams
		Parallel streams heavily used in HENP & elsewhere to achieve needed performance, so it is today’s de facto baseline
		However, hard to optimize both the window size AND number of streams since optimal values can vary due to network capacity, routes or utilization changes


	Uses exponential increase everywhere (in slow start and congestion avoidance)
	Multiplicative decrease factor b = 0.125
	Introduced by Tom Kelly of Cambridge


	Based on TCP Vegas
	Uses both queuing delay and packet losses as congestion measures
	Developed at Caltech by Steven Low and collaborators


	Behaves like Reno for small values of cwnd
	Above a chosen value of cwnd (default 38) a more aggressive function is used
	Uses a table to indicate by how much to increase cwnd when an ACK is received
	Introduced by Sally Floyd


	Mixture of HS-TCP with TCP-LP (Low Priority)
	Backs off early in face of congestion by looking at RTT
	Idea is to give scavengers service without router modifications
	From Rice University


	Combine:
		An additive increase used for large cwnd
		A binary search increase used for small cwnd
		Developed Injong Rhee at NC State University


	Similar to HS-TCP in switching to aggressive mode after threshold
	Uses an heterogeneous AIMD algorithm
	Developed at Hamilton U Ireland


	20 minute tests, long enough to see stable patterns
	Iperf reports incremental and cumulative throughputs at 5 second intervals
	Ping interval about 100ms
	At sender use: 1 for iperf/TCP, 2nd for cross-traffic (UDP or TCP), 3^rd for ping
	At receiver: use 1 machine for ping (echo) and TCP, 2^nd for cross-traffic


	3 main network paths
		Short distance: SLAC-Caltech (RTT~10ms)
		Middle distance: UFL and DataTAG Chicago (RTT~70ms)
		Long distance: CERN and University of Manchester (RTT ~ 170ms)
		Tests during nights and weekends to avoid unacceptable impacts on production traffic


	Set large maximum windows (typically 32MB) on all hosts
	Used 3 different windows with iperf:
		Small window size, factor 2-4 below optimal
		Roughly optimal window size (~BDP)
		Oversized window


	Only P-TCP appears to dramatically affect the RTT
		E.g. increases by RTT by 200ms (factor 20 for short distances)


	Regulates the size of the queue between the IP layer and the Ethernet layer
	May increase the throughput if we find optimal values
	But may increase duplicate ACKs (Y. T Li)


	Definition: standard deviation normalized by the average throughput


		At short RTT (10ms) stability is usually good (<=12%)
		At medium RTT (70ms) P-TCP, Scalable & Bic-TCP and appear more stable than the other protocols


	UDP does not back off in face of congestion, it has a “stiff” behavior
	We modified iperf to allow it to create UDP traffic with a sinusoidal time behavior, following an idea from Tom Hacker
		See how TCP responds to varying cross-traffic
	Used 2 periods of 30 and 60 seconds and amplitude varying from 20 to 80 Mbps
	Sent from 2^nd sending host to 2^nd receiving host while sending TCP from 1^st sending host to 1^st receiving host
	As long as the window size was large enough all protocols converged quickly and maintain a roughly constant aggregate throughput
	Especially for P-TCP & Bic-TCP


	Important to understand how fair a protocol is
	For one protocol competing against the same protocol (intra-protocol) we define the fairness for a single bottleneck as:


	All protocols have good intra-protocol Fairness (F>0.98)
	Except HS-TCP (F<0.94) when the window size > optimal


	Most have good intra-protocol fairness (diagonal elements), except HS-TCP
	Inter protocol Bic & H appear more fair against others
	Worst fairness are HSTCP-LP, P-TCP, S-TCP, Fast, HSTCP-LP
	But cannot tell who is aggressive and who is timid


	For inter-protocol fairness we introduce the asymmetry between the two throughputs:
		Where x₁ and x₂ are the throughput averages of TCP stack 1 competing with TCP stack 2


	Cause queuing on reverse path by using P-TCP 16 streams
	ACKs are lost or come back in bursts (compressed ACKs)
	Fast TCP throughput is 4 to 8 times less than the other TCPs.


	Finish measurements to Manchester/CERN
	More analysis
	Work with Caltech to correlate with simulation
	Compare with other people’s measurements
	Test Westwood+
	Tests with different RTTs on the same link
	Try on 10Gbps links
	More tests with multiple streams
	Look at performance of rate based protocols
	Use with production applications


	Advanced stacks behave like TCP-Reno single stream on short distances for up to Gbits/s paths, especially if window size limited
	TCP Reno single stream has low performance and is unstable on long distances
	P-TCP is very aggressive and impacts the RTT badly
	HSTCP-LP is too gentle, this can be important for providing scavenger service without router modifications. By design it backs off quickly, otherwise performs well
	Fast TCP is very handicapped by reverse traffic
	S-TCP is very aggressive on long distances
	HS-TCP is very gentle, like H-TCP has lower throughput than other protocols
	Bic-TCP performs very well in almost all cases


	TCP Stacks Evaluation:
		www-iepm.slac.stanford.edu/bw/tcp-eval/


	With optimal window all stacks within ~20% of one another, except Reno 1 stream on medium and long distances