Some people in
Internet 2
are starting to experiment with some form of
scavenger QoS service.
We (SLAC) believed we could be of some assistance
with this as we have critical needs to transmit large amounts of
data between Babar sites. These include IN2P3 in Lyon France, CERN,
RAL in the UK, INFN Rome Italy, Caltech as well as LLNL, LBNL and
Colorado. we can utilize most of the available bandwidth for quite long
periods (days) by using large windows and mulitiple streams
(see for example
Bulk
throughput measurements
and
Bulk throughput: streams vs. Windows).
One possibility for reducing the impact of our traffic on others is
to try a scavenger QoS service.
The basic idea of QBSS according to Stanislav Shalunov of Internet 2, is:
The gist is that you'd mark your bulk flows (which already have
congestion control) with a special DSCP value (001000), and this value
would be passed through by all networks involved. Some networks may
choose to ignore the marking and treat this traffic just like the
default best-effort class. Other network (particularly those that
experience congestion) would use a variation of weighted round-robin
queuing discipline (or whatever their router vendor calls it) to give
QBSS a very small percentage of link share on head ends of congested
links. This way, QBSS would use whatever is left over from the
default best-effort class. Within QBSS, there would be normal
competition of the same exact sort we find in BE.
Routes
One critical point is to understand the routes from SLAC
(or Stanford( to the
candidate sites mentioned above to see whether they pass through
Internet 2. A second requirement is some point
that the path connects through that has congestion. The traceroutes
below indicate which paths use Internet 2/Abilene and the pipechars
give some idea of where there may be congestion and the bottleneck
bandwidth. The standard deviation (sd) of the RTT also
gives some idea of the congestion (larger is more congested).
Path characteristics seen from SLAC
Click for traceroute | Click for Pipechar |
Min RTT (sd) | Map |
Sustained iperf throughput | AS's |
Caltech, Pasadena, CA |
75Mbps |
25 (0.13) msec | No |
40Mbps (Dec-00) |
CalREN2 |
CERN, Geneva, Switzerland |
69Mbps |
160 (0.23) msec | Yes |
56Mbps (Mar-01) |
ESnet |
Colorado, Boulder, CO |
94Mbps |
27.4 (0.91) msec | No |
20Mbps (May-01) |
CalREN2, Abilene |
Daresbury Lab, Liverpool, England |
17Mbps |
151 (3.42) msec | Yes |
40Mbps (Mar-01) |
ESnet, JANet |
GSFC/NASA |
80Mbps |
64 (22) msec | No |
|
CalREN2, Abilene, NASA |
IN2P3, Lyon, France |
27Mbps |
147 (0.31) msec | Yes |
30Mbps (Mar-01) |
ESnet, PHYnet |
INFN, Rome, Italy |
31Mbps |
180 (0.26) msec | Yes |
26Mbps (Mar-01) |
ESnet, TEN-155, GARR |
UTDallas, TX |
23Mbps |
84.6 (43) msec | No |
12Mbps (Feb-01) |
CalREN2, Abilene |
Path characteristics seen from Stanford
Click for traceroute | Click for Pipechar |
Min RTT (sd) | Map |
Sustained iperf throughput | AS's |
Daresbury Lab, Liverpool, England |
16Mbps |
141 (4.1) msec | No |
32 Mbps (May-01) |
CalREN2, Abilene, JAnet |
From the above tables it appears that good candidates for QBSS testing (i.e. they
pass through Abilene and there is congestion identified in the pipechar measurements and
standard deviation of RTT) are UTDallas, GSFC and Daresbury Lab.
Unfortunately according to Joe Izen of UT Dallas:
My network group explained that the router with plenty
of backplane bandwidth is running out of CPU cycles filtering more
packets in addition to its routing duties.
and as stated by Stansislav Shalunov:
In cases where apparent congestion is caused by scarcity of computing
cycles in a router rather than scarcity of link capacity one would
expect that any QoS technique would only hurt instead of helping.
GSFC
With the help of Andy Germain of GSFC we were able to get an
iperf server set up on 198.10.49.61 (pawn.eos.gsfc.nasa.gov),
running FreeBSD 3.5RELEASE #0 on a 200MHz Intel chip.
This machine has a Gbps Ethernet interface. The
pipechar from SLAC to the machine indicates
a bottleneck of about 150Mbps. The iperf server only allowed up to
9 parallel flows. We ran the TCP iperf client from
tersk07.slac.stanford.edu, a Sun Netra t 1400/1405
with 4*440MHz cpus with 4GB of memory running
Solaris 5.7, for 1, 5, 8,4,3,7,9,2,6 flows for window
sizes of 8, 1024,16, 512, 32, 256, 128 and 64 kBytes (see
Bulk Throughput for the methodology). The first graph below
shows the average TCP throughput (averaged over all flow settings). The second
graph shows the throughput by flow averaged over all the window sizes.
It can be seen that there is much more variation with window size compared to
flows, in fact increasing the number of flows may have a detrimental affect on
the throughput. This is different from the behavior seen for many other links
(see
Bulk Throughput and
Bulk throughput: Windows vs. Streams) where the use of
more streams is more effective than using large windows in achieving high
throughput. It is also seen that there is appears to be an optimum window
size around 500kBytes. This is less than would be predicted using the
RTT * Bottleneck bandwidth product which predicts about 1 MByte.
The following graphs show the throughput broken down by flows and windows
for 3 different sets of measurements at different times
(1st: May 11, 2001 22:58 - 23:22 PDT,
2nd: May 12, 2001 9:49am - 10:10 am, 3rd: May 12, 2001 10:33 - 11:03 PDT).
It can be seen that there is considrable variation from plot (time) to plot (time).
Looking at
EOS network graphs: destination: SLAC of TCP Iperf through from
destruction.gafc.nasa.gov ( 128.183.166.156), an SGI running IRIX 6.5, to SLAC, it can be seen that
there are large (factors of 8)
variations in throughput from hour to hour.
There is also considerable variation in the maximum throughput
that can be achieved, varying by over a factor of 3 from about 25Mbits/s to over
90Mbits/s. This may be caused by variation in the competing load (cross-traffic).
However, the pipechar,
indicates that the bottleneck is not within Abilene.
Daresbury
The folks at Daresbury have a UKERNA funded joint project with SLAC and I2 to
Investigate the effectiveness of QoS techniques.
Measurements of iperf throughput from
SLAC to DL and
Stanford to DL are
available.
The host that we have access to at DL, is a Linux host which has not been configured to allow
windows of > 64KBytes, so measurements from DL are limited in their applicability.
We have installed the bbcp network file copy
program at DL and made some
measurements from SLAC and DL.
We have verified using
bbcp to set the QBSS bit at DL and send the
packets to SLAC, and then
using snoop on a Solaris host at SLAC, that the bit is still set when the
packet reaches SLAC.
Page owner: Les Cottrell