Passive vs. Active Measurements for BBCP
I-Heng Mei
Updated: Fri May 24 15:06:54 PDT 2002


I was asked to investigate why passive throughput calculations were lower than the active calculations for some nodes.

One can easily think of many reasons why the passive thoughput calculations would be higher :

  • passive measurements count the BBCP header as transferred data, whereas the active measurement made by BBCP itself would not
  • passive measurements account for retransmitted packets, whereas the active measurement does not.

    So it seems counter-intuitive that the passive throughput would be consistently lower than the active throughput at certain nodes.

    By using a packet sniffer, I was able to deduce one of the causes for this behavior.

    The main cause of this behavior is that control messages are not counted by the active measurements. At some nodes, this is not noticible, perhaps because the node is "close" to SLAC and the the overhead of control messages before and after the transfer is relatively small.

    At nodes that are further away, this causes problems. For example, here is a sample 15 second transfer from pharlap to node1.roma1.infn.it:

    Active(BBCP output) Passive(tcpdump analysis)
    At 020524 23:21:53 copy 0% complete; 0.0 KB/s, avg 0.0 KB/s
    At 020524 23:21:54 copy 0% complete; 0.0 KB/s, avg 0.0 KB/s
    At 020524 23:21:55 copy 0% complete; 960.0 KB/s, avg 461.1 KB/s
    At 020524 23:21:56 copy 0% complete; 704.0 KB/s, avg 539.9 KB/s
    At 020524 23:21:57 copy 0% complete; 2240.0 KB/s, avg 956.4 KB/s
    At 020524 23:21:58 copy 0% complete; 1216.0 KB/s, avg 1007.5 KB/s
    At 020524 23:21:59 copy 0% complete; 1216.0 KB/s, avg 1041.8 KB/s
    At 020524 23:22:00 copy 0% complete; 0.0 KB/s, avg 894.7 KB/s
    At 020524 23:22:01 copy 0% complete; 2688.0 KB/s, avg 1116.6 KB/s
    At 020524 23:22:02 copy 0% complete; 1216.0 KB/s, avg 1127.5 KB/s
    At 020524 23:22:03 copy 0% complete; 1292.9 KB/s, avg 1143.8 KB/s
    At 020524 23:22:04 copy 0% complete; 506.9 KB/s, avg 1085.7 KB/s
    At 020524 23:22:05 copy 0% complete; 1280.0 KB/s, avg 1101.8 KB/s
    At 020524 23:22:06 copy 0% complete; 1280.0 KB/s, avg 1115.4 KB/s
    At 020524 23:22:07 copy 0% complete; 1472.0 KB/s, avg 1140.7 KB/s
    Time limit exceeded
    At 020524 23:22:08 copy 0% complete; 384.0 KB/s, avg 1090.6 KB/s
    
    KB/s for each one second interval
    interval(s) |   KB/s  | avg KB/s
     0 -   1        1.1       1.1
     1 -   2     1801.0     901.1
     2 -   3     1055.5     952.5
     3 -   4     1301.2    1039.7
     4 -   5     1355.4    1102.8
     5 -   6     1310.1    1137.4
     6 -   7     1254.9    1154.2
     7 -   8     1406.3    1185.7
     8 -   9     1072.9    1173.2
     9 -  10     1044.1    1160.3
     10 -  11     1412.9    1183.2
     11 -  12     1187.5    1183.6
     12 -  13     1004.2    1169.8
     13 -  14     1162.4    1169.3
     14 -  15     1023.4    1159.5
     15 -  16     1090.0    1155.2
     16 -  17      274.9    1103.4
     17 -  18       56.2    1045.2
     18 -  19        0.0     990.2
     19 -  20        0.0     940.7
     20 -  21        0.0     895.9
     21 -  22        0.0     855.2
     22 -  23        0.0     818.0
     23 -  24        0.0     783.9
    

    Notice that, according to the passive analysis, the transmission lasts around 24 seconds, whereas the active measurement considered it to last 15 seconds. Also, from second 16 onward in the tcpdump analysis, the bytes transferred drops sharply. Even more signficant, from seconds 18 through 24, there are still tcp packets sent, although none of them contain any payload data. They are merely control packets (and as far as I could tell, they were all FIN and ACKs). This final interval of control packets surely causes the passive calculations to yield a lower KB/s.

    When BBCP uses only one stream, the above effect is essentially non-existent. As the number of parallel streams increases, the amount of time spent at the end of the transfer sending only control packets increases as well.

    This phenomenon is more noticable in certain nodes, especially ones that are further away, like *.it.

    These lingering sockets occur because even after the application requests the socket connections to be closed, it may take the kernel a few seconds to actually close the connection. This will inflate the elapsed time for passive measurements, but not for active measurements

    RECOMMENDATIONS, PROPOSED FIXES

    1. If there is something that the BBCP program can do to prevent this behavior, then that would fix the problem. (Andy, do you know what is happening? And is it within BBCP's power to prevent this?) Andy recently modified the solaris version of BBCP to close sockets earlier. This cuts the dead time at the end by about 1/2. (He disabled the SO_LINGER socket option)

    2. We could simply increase the timeout. This would not prevent the behavior, but would lessen its effect. When the timeout > 120 seconds, the active and passive measurements (from pharlap to roma1.infn.it) were within 2% of each other, while with a timeout of 15 seconds, the difference ranged from 20-30%. I also think that increasing the timeout would also increase the correlation between passive and active, but I haven't tested that.

    3. Another option would be to use tcpdumps instead of netflow to make passive measurements for bbcp. We could then focus the throughput calculation solely on the actual data tranfer (much like the active measurements do). But I suppose that this option is unrealistic?

    4. Yet another option - Don't change the bandwidth monitoring code. On the public Bandwidth Measuring and Monitoring webpage, just explain some of the reasons why the passive measurements are lower at certain nodes.