1
|
- Les Cottrell & Fabrizio Coccetti– SLAC
- Prepared for the Internet2, Washington, April 2003
- http://www.slac.stanford.edu/grp/scs/net/talk/fast-i2-apr03.html
|
2
|
- High throughput challenges
- New TCP stacks
- Tests on Unloaded (testbed) links
- Performance of multi-streams
- Performance of various stacks
- Tests on Production networks
- Stack comparisons with single streams
- Stack comparisons with multiple streams
- Fairness
- Where do I find out more?
|
3
|
- After a loss it can take over an hour for stock TCP (Reno) to recover to
maximum throughput at 1Gbits/s
- i.e. loss rate of 1 in ~ 2 Gpkts (3Tbits), or BER of 1 in 3.6*1012
|
4
|
- Reno (AIMD) based, loss indicates congestion
- Back off less when see congestion
- Recover more quickly after backing off
- Scalable TCP: exponential recovery
- Tom Kelly, Scalable TCP: Improving Performance in Highspeed Wide Area
Networks Submitted for publication, December 2002.
- High Speed TCP: same as Reno for low performance, then increase window
more & more aggressively as window increases using a table
- Vegas based, RTT indicates congestion
- Caltech FAST TCP, quicker response to congestion, but …
|
5
|
|
6
|
- Caltech: Harvey Newman, Steven Low, Sylvain Ravot, Cheng Jin, Xiaoling
Wei, Suresh Singh, Julian Bunn
- SLAC: Les Cottrell, Gary Buhrmaster, Fabrizio Coccetti
- LANL: Wu-chun Feng, Eric Weigle, Gus Hurwitz, Adam Englehart
- NIKHEF/UvA: Cees DeLaat, Antony Antony
- CERN: Olivier Martin, Paolo Moroni
- ANL: Linda Winkler
- DataTAG, StarLight, TeraGrid, SURFnet, NetherLight, Deutsche Telecom,
Information Society Technologies
- Cisco, Level(3), Intel
- DoE, European Commission, NSF
|
7
|
- Well accepted that multiple streams (n) and/or big windows are important
to achieve optimal throughput
- Effectively reduces impact of a loss by 1/n, and improves recovery
time by 1/n
- Optimum windows & streams changes with changes (e.g. utilization) in
path, hard to optimize n
- Can be unfriendly to others
|
8
|
- Above knee performance still improves slowly, maybe due to squeezing out
others and taking more than fair share due to large number of streams
|
9
|
- Need to measure all parameters to understand effects of parameters,
configurations:
- Windows, streams, txqueuelen, TCP stack, MTU, NIC card
- Lot of variables
- Examples of 2 TCP stacks
- FAST TCP no longer needs multiple streams, this is a major
simplification (reduces # variables to tune by 1)
|
10
|
|
11
|
|
12
|
|
13
|
|
14
|
|
15
|
|
16
|
|
17
|
|
18
|
|
19
|
|
20
|
- With single flow & empty network:
- Can saturate 2.5 Gbps with standard TCP & jumbos
- Can saturate 1Gbps with new stacks & 1500B frame or with standard
& jumbos
- With production network,
- FAST can take a while to get going
- Once going, FAST TCP with one stream looks good compared to multi-stream RENO
- FAST can back down early compared to RENO
- More work needed on fairness
- Scalable
- Does not look as good vs. multi-stream Reno
|
21
|
- Go beyond 2.5Gbits/s
- Disk-to-disk throughput & useful applications
- Need faster cpus (extra 60% MHz/Mbits/s over TCP for disk to disk),
understand how to use multi-processors
- Further evaluate new stacks with real-world links, and other equipment
- Other NICs
- Response to congestion, pathologies
- Fairness
- Deploy for some major (e.g. HENP/Grid) customer applications
- Understand how to make 10GE NICs work well with 1500B MTUs
- Move from “hero” demonstrations to commonplace
|
22
|
- 10GE tests
- www-iepm.slac.stanford.edu/monitoring/bulk/10ge/
- sravot.home.cern.ch/sravot/Networking/10GbE/10GbE_test.html
- TCP stacks
- netlab.caltech.edu/FAST/
- datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf
- www.icir.org/floyd/hstcp.html
- Stack comparisons
- www-iepm.slac.stanford.edu/monitoring/bulk/fast/
- www.csm.ornl.gov/~dunigan/net100/floyd.html
- www-iepm.slac.stanford.edu/monitoring/bulk/tcpstacks/
|
23
|
|
24
|
|
25
|
|
26
|
- Large windows and large number of streams can cause last stream to take
a long time to close.
- Linux memory leak
- Linux TCP configuration caching
- What is the window size actually used/reported
- 32 bit counters in iperf and routers wrap, need latest releases with
64bit counters
- Effects of txqueuelen (number of packets queued for NIC)
- Routers do not pass jumbos
- Performance differs between drivers and NICs from different manufacturers
- May require tuning a lot of parameters
|