FAST TCP for Multi-Gbps WAN: Experiments and Applications

Les Cottrell & Fabrizio Coccetti– SLAC

Prepared for the Internet2, Washington, April 2003

http://www.slac.stanford.edu/grp/scs/net/talk/fast-i2-apr03.html

Outline

High throughput challenges

New TCP stacks

Tests on Unloaded (testbed) links

Performance of multi-streams

Performance of various stacks

Tests on Production networks

Stack comparisons with single streams

Stack comparisons with multiple streams

Fairness

Where do I find out more?

High Speed Challenges

After a loss it can take over an hour for stock TCP (Reno) to recover to maximum throughput at 1Gbits/s

i.e. loss rate of 1 in ~ 2 Gpkts (3Tbits), or BER of 1 in 3.6*10¹²

New TCP Stacks

Reno (AIMD) based, loss indicates congestion

Back off less when see congestion

Recover more quickly after backing off

Scalable TCP: exponential recovery

Tom Kelly, Scalable TCP: Improving Performance in Highspeed Wide Area Networks Submitted for publication, December 2002.

High Speed TCP: same as Reno for low performance, then increase window more & more aggressively as window increases using a table

Vegas based, RTT indicates congestion

Caltech FAST TCP, quicker response to congestion, but …

Typical testbed

Testbed Collaborators and sponsors

Caltech: Harvey Newman, Steven Low, Sylvain Ravot, Cheng Jin, Xiaoling Wei, Suresh Singh, Julian Bunn

SLAC: Les Cottrell, Gary Buhrmaster, Fabrizio Coccetti

LANL: Wu-chun Feng, Eric Weigle, Gus Hurwitz, Adam Englehart

NIKHEF/UvA: Cees DeLaat, Antony Antony

CERN: Olivier Martin, Paolo Moroni

ANL: Linda Winkler

DataTAG, StarLight, TeraGrid, SURFnet, NetherLight, Deutsche Telecom, Information Society Technologies

Cisco, Level(3), Intel

DoE, European Commission, NSF

Windows and Streams

Well accepted that multiple streams (n) and/or big windows are important to achieve optimal throughput

Effectively reduces impact of a loss by 1/n, and improves recovery time by 1/n

Optimum windows & streams changes with changes (e.g. utilization) in path, hard to optimize n

Can be unfriendly to others

Even with big windows (1MB) still need multiple streams with Standard TCP

Above knee performance still improves slowly, maybe due to squeezing out others and taking more than fair share due to large number of streams

Stock vs FAST TCP
MTU=1500B

Need to measure all parameters to understand effects of parameters, configurations:

Windows, streams, txqueuelen, TCP stack, MTU, NIC card

Lot of variables

Examples of 2 TCP stacks

FAST TCP no longer needs multiple streams, this is a major simplification (reduces # variables to tune by 1)

TCP stacks with 1500B MTU @1Gbps

Jumbo frames, new TCP stacks at 1 Gbits/s

Production network tests

High Speed TCP vs Reno – 1 Stream

Slide 14

Slide 15

Scalable vs multi-streams

FAST & Scalable vs. Multi-stream Reno (SLAC>CERN ~230ms)

Scalable & FAST TCP with 1 stream vs Reno with n streams

Fairness FAST vs Reno

Summary (very preliminary)

With single flow & empty network:

Can saturate 2.5 Gbps with standard TCP & jumbos

Can saturate 1Gbps with new stacks & 1500B frame or with standard & jumbos

With production network,

FAST can take a while to get going

Once going, FAST TCP with one stream looks good compared to multi-stream RENO

FAST can back down early compared to RENO

More work needed on fairness

Scalable

Does not look as good vs. multi-stream Reno

What’s next?

Go beyond 2.5Gbits/s

Disk-to-disk throughput & useful applications

Need faster cpus (extra 60% MHz/Mbits/s over TCP for disk to disk), understand how to use multi-processors

Further evaluate new stacks with real-world links, and other equipment

Other NICs

Response to congestion, pathologies

Fairness

Deploy for some major (e.g. HENP/Grid) customer applications

Understand how to make 10GE NICs work well with 1500B MTUs

Move from “hero” demonstrations to commonplace

More Information

10GE tests

www-iepm.slac.stanford.edu/monitoring/bulk/10ge/

sravot.home.cern.ch/sravot/Networking/10GbE/10GbE_test.html

TCP stacks

netlab.caltech.edu/FAST/

datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf

www.icir.org/floyd/hstcp.html

Stack comparisons

www-iepm.slac.stanford.edu/monitoring/bulk/fast/

www.csm.ornl.gov/~dunigan/net100/floyd.html

www-iepm.slac.stanford.edu/monitoring/bulk/tcpstacks/

Extras

FAST TCP vs. Reno – 1 stream

Scalable vs. Reno - 1 stream

Other high speed gotchas

Large windows and large number of streams can cause last stream to take a long time to close.

Linux memory leak

Linux TCP configuration caching

What is the window size actually used/reported

32 bit counters in iperf and routers wrap, need latest releases with 64bit counters

Effects of txqueuelen (number of packets queued for NIC)

Routers do not pass jumbos

Performance differs between drivers and NICs from different manufacturers

May require tuning a lot of parameters


	Les Cottrell & Fabrizio Coccetti– SLAC
	Prepared for the Internet2, Washington, April 2003
	http://www.slac.stanford.edu/grp/scs/net/talk/fast-i2-apr03.html


	High throughput challenges
	New TCP stacks
	Tests on Unloaded (testbed) links
		Performance of multi-streams
		Performance of various stacks
	Tests on Production networks
		Stack comparisons with single streams
		Stack comparisons with multiple streams
		Fairness
	Where do I find out more?


	After a loss it can take over an hour for stock TCP (Reno) to recover to maximum throughput at 1Gbits/s
		i.e. loss rate of 1 in ~ 2 Gpkts (3Tbits), or BER of 1 in 3.6*10¹²


Reno (AIMD) based, loss indicates congestion
	Back off less when see congestion
	Recover more quickly after backing off
		Scalable TCP: exponential recovery
			Tom Kelly, Scalable TCP: Improving Performance in Highspeed Wide Area Networks Submitted for publication, December 2002.
		High Speed TCP: same as Reno for low performance, then increase window more & more aggressively as window increases using a table
Vegas based, RTT indicates congestion
	Caltech FAST TCP, quicker response to congestion, but …


	Caltech: Harvey Newman, Steven Low, Sylvain Ravot, Cheng Jin, Xiaoling Wei, Suresh Singh, Julian Bunn
	SLAC: Les Cottrell, Gary Buhrmaster, Fabrizio Coccetti
	LANL: Wu-chun Feng, Eric Weigle, Gus Hurwitz, Adam Englehart
	NIKHEF/UvA: Cees DeLaat, Antony Antony
	CERN: Olivier Martin, Paolo Moroni
	ANL: Linda Winkler
	DataTAG, StarLight, TeraGrid, SURFnet, NetherLight, Deutsche Telecom, Information Society Technologies
	Cisco, Level(3), Intel
	DoE, European Commission, NSF


	Well accepted that multiple streams (n) and/or big windows are important to achieve optimal throughput
	Effectively reduces impact of a loss by 1/n, and improves recovery time by 1/n
	Optimum windows & streams changes with changes (e.g. utilization) in path, hard to optimize n
	Can be unfriendly to others


	Above knee performance still improves slowly, maybe due to squeezing out others and taking more than fair share due to large number of streams


	Need to measure all parameters to understand effects of parameters, configurations:
		Windows, streams, txqueuelen, TCP stack, MTU, NIC card
		Lot of variables
	Examples of 2 TCP stacks
		FAST TCP no longer needs multiple streams, this is a major simplification (reduces # variables to tune by 1)


	With single flow & empty network:
		Can saturate 2.5 Gbps with standard TCP & jumbos
		Can saturate 1Gbps with new stacks & 1500B frame or with standard & jumbos
	With production network,
		FAST can take a while to get going
		Once going, FAST TCP with one stream looks good compared to multi-stream RENO
		FAST can back down early compared to RENO
		More work needed on fairness
	Scalable
		Does not look as good vs. multi-stream Reno


	Go beyond 2.5Gbits/s
	Disk-to-disk throughput & useful applications
		Need faster cpus (extra 60% MHz/Mbits/s over TCP for disk to disk), understand how to use multi-processors
	Further evaluate new stacks with real-world links, and other equipment
		Other NICs
		Response to congestion, pathologies
		Fairness
		Deploy for some major (e.g. HENP/Grid) customer applications
	Understand how to make 10GE NICs work well with 1500B MTUs
	Move from “hero” demonstrations to commonplace


	10GE tests
		www-iepm.slac.stanford.edu/monitoring/bulk/10ge/
		sravot.home.cern.ch/sravot/Networking/10GbE/10GbE_test.html
	TCP stacks
		netlab.caltech.edu/FAST/
		datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf
		www.icir.org/floyd/hstcp.html
	Stack comparisons
		www-iepm.slac.stanford.edu/monitoring/bulk/fast/
		www.csm.ornl.gov/~dunigan/net100/floyd.html
		www-iepm.slac.stanford.edu/monitoring/bulk/tcpstacks/


	Large windows and large number of streams can cause last stream to take a long time to close.
	Linux memory leak
	Linux TCP configuration caching
	What is the window size actually used/reported
	32 bit counters in iperf and routers wrap, need latest releases with 64bit counters
	Effects of txqueuelen (number of packets queued for NIC)
	Routers do not pass jumbos
	Performance differs between drivers and NICs from different manufacturers
		May require tuning a lot of parameters