Problems with Oracle between SLAC and ORNL, Jan 2006Les Cottrell (SLAC) and Bill Wing (ORNL) Page created: February 2, 2006Central Computer Access | Computer Networking | Network Group | More case studies |
|
From: Lahey, Terri E. Sent: Wednesday, February 01, 2006 2:18 PM To: Cottrell, Les Cc: Lahey, Terri E. Subject: connection to SNS Les, Is the connection between SLAC and SNS (ornl) running at high throughput? The LCLS software team is testing XAL code connecting to an RDB (ORACLE, I think) using servers at SNS (ORNL). They see slowness, and are trying to identify where it is. They are connecting to snsdb1.sns.ornl.gov (port: 1521) My guess is that this is slowness once they get into SNS, or on their application desktop. Can you check if packet traffic is fast between SLAC and SNS, so we can eliminate that? Terri ps. here's a traceroute that looks like icmp is disabled into SNS. lahey@flora03 $ traceroute snsdb1.sns.ornl.gov traceroute: Warning: ckecksums disabled traceroute to snsdb1.sns.ornl.gov (160.91.230.34), 30 hops max, 40 byte packets 1 rtrg-nethub.slac.stanford.edu (134.79.19.1) 0.414 ms 0.284 ms 0.217 ms 2 134.79.255.25 (134.79.255.25) 11.183 ms 0.504 ms 0.351 ms 3 rtr-dmz1-ger.slac.stanford.edu (134.79.135.15) 0.335 ms 0.341 ms 0.347 ms 4 192.68.191.146 (192.68.191.146) 0.469 ms 0.515 ms 0.463 ms 5 slacmr1-slacrt4.es.net (134.55.209.93) 0.584 ms 0.481 ms 0.461 ms 6 snv2mr1-slacmr1.es.net (134.55.217.2) 0.838 ms 0.785 ms 0.723 ms 7 snv1mr1-snv2mr1.es.net (134.55.217.5) 0.837 ms 0.752 ms 0.721 ms 8 snvcr1-snv1mr1.es.net (134.55.218.21) 0.838 ms 0.824 ms 0.845 ms 9 elpcr1-oc48-snvcr1.es.net (134.55.209.218) 27.545 ms 27.522 ms 27.553 ms 10 atlcr1-oc48-elpcr1.es.net (134.55.209.222) 61.824 ms 61.692 ms 61.869 ms 11 ornl-oc48-atlcr1.es.net (134.55.213.210) 66.308 ms 66.467 ms 66.357 ms 12 192.31.96.1 (192.31.96.1) 68.730 ms 68.145 ms 66.492 ms 13 * * * 14 * * *Over the phone Terri also indicated that the time to run the "transaction" between hosts at ORNL is 3-10 seconds, while between SLSAC and ORNL it is 25-30secs.
traceroute to iepm-resp.slac.stanford.edu (134.79.240.36), 64 hops max, 40 byte packets 1 swgecsb-1-004.ens.ornl.gov (160.91.212.1) 0.667 ms 0.252 ms 0.208 ms 2 ornlgwy.ens.ornl.gov (160.91.0.1) 0.272 ms 0.229 ms 0.241 ms 3 orgwy-fw (192.31.96.161) 0.558 ms 0.501 ms 0.397 ms 4 ornl-rt3-ge.cind.ornl.gov (192.31.96.2) 0.560 ms 0.576 ms 0.446 ms 5 atlcr1-oc48-ornl.es.net (134.55.213.209) 5.201 ms 99.826 ms 29.394 ms 6 elpcr1-oc48-atlcr1.es.net (134.55.209.221) 39.546 ms 39.549 ms 39.469 ms 7 snvcr1-oc48-elpcr1.es.net (134.55.209.217) 66.154 ms 66.136 ms 69.778 ms 8 snv1mr1-snvcr1.es.net (134.55.218.22) 66.187 ms 66.198 ms 66.255 ms 9 snv2mr1-snv1mr1.es.net (134.55.217.6) 66.192 ms 66.197 ms 66.374 ms 10 slacmr1-snv2mr1.es.net (134.55.217.1) 66.658 ms 67.601 ms 66.668 ms 11 slacrt4-slacmr1.es.net (134.55.209.94) 66.671 ms 66.737 ms 66.555 ms 12 rtr-dmz1-vlan400.slac.stanford.edu (192.68.191.149) 66.868 ms 66.905 ms 66.703 ms 13 * * * 14 * * * 15 iepm-resp.slac.stanford.edu (134.79.240.36) 66.599 ms 66.957 ms 66.775 msComparing that with what Terri saw:
lahey@flora03 $ traceroute snsdb1.sns.ornl.gov traceroute: Warning: ckecksums disabled traceroute to snsdb1.sns.ornl.gov (160.91.230.34), 30 hops max, 40 byte packets 1 rtrg-nethub.slac.stanford.edu (134.79.19.1) 0.414 ms 0.284 ms 0.217 ms 2 134.79.255.25 (134.79.255.25) 11.183 ms 0.504 ms 0.351 ms 3 rtr-dmz1-ger.slac.stanford.edu (134.79.135.15) 0.335 ms 0.341 ms 0.347 ms 4 192.68.191.146 (192.68.191.146) 0.469 ms 0.515 ms 0.463 ms 5 slacmr1-slacrt4.es.net (134.55.209.93) 0.584 ms 0.481 ms 0.461 ms 6 snv2mr1-slacmr1.es.net (134.55.217.2) 0.838 ms 0.785 ms 0.723 ms 7 snv1mr1-snv2mr1.es.net (134.55.217.5) 0.837 ms 0.752 ms 0.721 ms 8 snvcr1-snv1mr1.es.net (134.55.218.21) 0.838 ms 0.824 ms 0.845 ms 9 elpcr1-oc48-snvcr1.es.net (134.55.209.218) 27.545 ms 27.522 ms 27.553 ms 10 atlcr1-oc48-elpcr1.es.net (134.55.209.222) 61.824 ms 61.692 ms 61.869 ms 11 ornl-oc48-atlcr1.es.net (134.55.213.210) 66.308 ms 66.467 ms 66.357 ms 12 192.31.96.1 (192.31.96.1) 68.730 ms 68.145 ms 66.492 ms 13 * * *These look to be the inverse of each other inside ESnet, but aren't symmetric inside Stanford/SLAC.
112cottrell@noric01:~>bin/fpingroute.pl -i 3 -c 1000 snsdb1.sns.ornl.gov Wed Feb 1 15:11:25 2006 Architecture=LINUX, commands=/usr/sbin/traceroute -q 1 and fping snsdb1.sns.ornl.gov fpingroute.pl version=0.21, 8/24/04. Author cottrell@slac.stanford.edu, debug=1 using traceroute to get nodes in route from noric01 (134.79.86.51) to snsdb1.sns.ornl.gov starting at node 3 traceroute to snsdb1.sns.ornl.gov (160.91.230.34), 30 hops max, 38 byte packets fpingroute.pl version 0.21, 8/24/04 found 30 hops in route from noric01 to snsdb1.sns.ornl.gov 3 slac-rt4.es.net (192.68.191.146) 0.292 ms 4 slacmr1-slacrt4.es.net (134.55.209.93) 0.255 ms 5 snv2mr1-slacmr1.es.net (134.55.217.2) 0.868 ms 6 snv1mr1-snv2mr1.es.net (134.55.217.5) 0.732 ms 7 snvcr1-snv1mr1.es.net (134.55.218.21) 0.822 ms 8 elpcr1-oc48-snvcr1.es.net (134.55.209.218) 27.571 ms 9 atlcr1-oc48-elpcr1.es.net (134.55.209.222) 61.616 ms 10 ornl-oc48-atlcr1.es.net (134.55.213.210) 66.251 ms 11 192.31.96.1 (192.31.96.1) 66.318 ms 12 * ... 30 * Wed Feb 1 15:13:01 2006 wrote 28 addresses to /tmp/fpingaddr now ping each address 1000 times from noric01 starting at hop 3 ... pings/node=1000 100 byte packets 1400 byte packets to NODE (from noric01) %loss min max avg %loss min max avg 3 slac-rt4.es.net 0.0% 0.5 45.3 1.4 0.0% 1.0 50.6 1.7 4 slacmr1-slacrt4.es.net 0.0% 0.3 229.3 5.3 0.0% 0.7 221.7 5.5 5 snv2mr1-slacmr1.es.net 0.0% 0.7 197.6 6.5 0.0% 1.0 218.7 6.5 6 snv1mr1-snv2mr1.es.net 0.0% 0.7 215.8 5.6 0.1% 1.1 217.4 6.4 7 snvcr1-snv1mr1.es.net 0.0% 0.8 24.6 1.1 0.0% 1.5 33.8 1.8 8 elpcr1-oc48-snvcr1.es.net 0.0% 27.6 100.0 30.0 0.0% 28.5 110.3 30.9 9 atlcr1-oc48-elpcr1.es.net 0.0% 61.8 133.4 64.9 0.0% 62.7 140.0 66.2 10 ornl-oc48-atlcr1.es.net 0.0% 66.4 113.1 67.4 0.0% 67.2 135.6 68.5 11 192.31.96.1 0.1% 66.2 272.1 67.6 0.6% 66.7 95.8 67.0 Wed Feb 1 15:46:41 2006 fpingroute.pl done. 113cottrell@noric01:~>This indicates there is low loss at least until the last hop, and little dispersion on the RTTs at El Paso and beyond. Unfortunately pings to the end host are blocked so we cannot measure the end-to-end losses. If we use the Mathis formula to estimate the maximum standard TCP throughput then for 0.1% loss we get 52Mbits/s and for 0.6% loss we get 22 Mbits/s.
8cottrell@iepm-resp:~>sudo /afs/slac.stanford.edu/package/netperf/bin/@sys/pathneck snsdb1.sns.ornl.gov 1138834332.027564 160.91.230.34 500 60 0 00 0.227 134.79.243.1 1038 01 0.225 134.79.252.5 1019 02 0.254 134.79.135.15 1009 03 0.346 192.68.191.146 957 04 0.345 134.55.209.93 926 05 0.718 134.55.217.2 972 06 0.753 134.55.217.5 920 07 0.772 134.55.218.21 926 08 27.492 134.55.209.218 908 09 61.680 134.55.209.222 877 10 66.239 134.55.213.210 946 11 66.303 192.31.96.1 817Then we looked at the router utilization plots for ESnet routers along the route. Hops 6-10 showed light utilization.
90cottrell@iepm-resp:~>abing -t 5 -n 10 -b 80 -d wrw.ornl.gov 1138854487 T: 160.91.212.99 ABw-Xtr-DBC: 940.7 55.3 996.0 ABW: 940.7 Mbps RTT: 67.128 70.319 100.956 ms 80 80 1138854487 F: 160.91.212.99 ABw-Xtr-DBC: 4.0 817.2 821.2 ABW: 4.0 Mbps RTT: 67.128 70.319 100.956 ms 80 80 1138854494 T: 160.91.212.99 ABw-Xtr-DBC: 905.4 94.6 1000.0 ABW: 931.9 Mbps RTT: 67.106 70.098 100.808 ms 80 80 1138854494 F: 160.91.212.99 ABw-Xtr-DBC: 4.2 782.8 786.9 ABW: 4.0 Mbps RTT: 67.106 70.098 100.808 ms 80 80 1138854502 T: 160.91.212.99 ABw-Xtr-DBC: 826.2 172.3 998.6 ABW: 905.5 Mbps RTT: 67.126 69.991 100.840 ms 80 80 1138854502 F: 160.91.212.99 ABw-Xtr-DBC: 4.5 669.9 674.4 ABW: 4.1 Mbps RTT: 67.126 69.991 100.840 ms 80 80 1138854509 T: 160.91.212.99 ABw-Xtr-DBC: 956.0 43.3 999.4 ABW: 918.1 Mbps RTT: 67.132 67.831 96.850 ms 80 80 1138854509 F: 160.91.212.99 ABw-Xtr-DBC: 19.6 598.0 617.6 ABW: 8.0 Mbps RTT: 67.132 67.831 96.850 ms 80 80 1138854517 T: 160.91.212.99 ABw-Xtr-DBC: 983.4 16.6 1000.0 ABW: 934.4 Mbps RTT: 67.130 69.322 100.845 ms 80 80 1138854517 F: 160.91.212.99 ABw-Xtr-DBC: 5.7 677.4 683.0 ABW: 7.4 Mbps RTT: 67.130 69.322 100.845 ms 80 80 1138854525 T: 160.91.212.99 ABw-Xtr-DBC: 971.0 28.7 999.6 ABW: 943.6 Mbps RTT: 67.137 68.124 86.116 ms 80 80 1138854525 F: 160.91.212.99 ABw-Xtr-DBC: 13.1 633.1 646.2 ABW: 8.8 Mbps RTT: 67.137 68.124 86.116 ms 80 80 1138854532 T: 160.91.212.99 ABw-Xtr-DBC: 1000.0 0.0 1000.0 ABW: 957.7 Mbps RTT: 67.147 68.956 100.861 ms 80 80 1138854532 F: 160.91.212.99 ABw-Xtr-DBC: 6.8 740.6 747.4 ABW: 8.3 Mbps RTT: 67.147 68.956 100.861 ms 80 80 1138854540 T: 160.91.212.99 ABw-Xtr-DBC: 910.7 89.3 1000.0 ABW: 945.9 Mbps RTT: 67.117 67.933 100.696 ms 80 80 1138854540 F: 160.91.212.99 ABw-Xtr-DBC: 16.8 714.7 731.6 ABW: 10.5 Mbps RTT: 67.117 67.933 100.696 ms 80 80 1138854547 T: 160.91.212.99 ABw-Xtr-DBC: 990.6 9.4 1000.0 ABW: 957.1 Mbps RTT: 67.137 70.655 100.833 ms 80 80 1138854547 F: 160.91.212.99 ABw-Xtr-DBC: 3.5 367.6 371.0 ABW: 8.7 Mbps RTT: 67.137 70.655 100.833 ms 80 80 1138854555 T: 160.91.212.99 ABw-Xtr-DBC: 962.4 37.6 1000.0 ABW: 958.4 Mbps RTT: 67.131 68.161 100.704 ms 80 80 1138854555 F: 160.91.212.99 ABw-Xtr-DBC: 12.7 676.4 689.1 ABW: 9.7 Mbps RTT: 67.131 68.161 100.704 ms 80 80 (Avg/Sdev) RTT: 69.139/1.027 ms ABW To: 944.647/49.543 From: 9.085/5.654 Mbits/s Exit 82 91cottrell@iepm-resp:~>Where F = from ORNL to SLAC, To = from SLAC to ORNL, ABw=available bandwidth, Xtr=Cross-traffic and DBC = Dynamic Bandwidth capacity. It is seen that there is little congestion (low Xtr) and plenty of available bandwidth from SLAC to ORNL. From ORNL there appears to be cross-traffic and less available bandwidth.
Client connecting to iepm-resp.slac.stanford.edu, TCP port 5001 TCP window size: 1.00 MByte (WARNING: requested 1.00 MByte) ------------------------------------------------------------ [ 3] local 160.91.212.99 port 49396 connected with 134.79.240.36 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 5.0 sec 66.4 MBytes 111 Mbits/sec [ 3] 5.0-10.0 sec 73.4 MBytes 123 Mbits/sec [ 3] 10.0-15.0 sec 72.3 MBytes 121 Mbits/sec [ 3] 15.0-20.0 sec 20.3 MBytes 34.1 Mbits/sec [ 3] 20.0-25.0 sec 30.4 MBytes 51.0 Mbits/sec [ 3] 25.0-30.0 sec 33.6 MBytes 56.4 Mbits/sec [ 3] 30.0-35.0 sec 37.1 MBytes 62.2 Mbits/sec [ 3] 35.0-40.0 sec 39.5 MBytes 66.3 Mbits/sec [ 3] 40.0-45.0 sec 42.2 MBytes 70.9 Mbits/sec [ 3] 45.0-50.0 sec 45.6 MBytes 76.5 Mbits/sec [ 3] 50.0-55.0 sec 48.9 MBytes 82.0 Mbits/sec [ 3] 55.0-60.0 sec 39.4 MBytes 66.1 Mbits/sec [ 3] 0.0-60.5 sec 549 MBytes 76.2 Mbits/secWith an RTT of ~70ms using the bandwidth delay product (BDP) to predict the window, a 1MByte window will only give about 115Mbits/s.
Using multiple parallel streams from ORNL to SLAC we get:
wrw:~/cstuff/iperf wrw$ ./iperf -c iepm-resp.slac.stanford.edu -w 1m -t 60 -i 200 -P 10 ------------------------------------------------------------ Client connecting to iepm-resp.slac.stanford.edu, TCP port 5001 TCP window size: 1.00 MByte (WARNING: requested 1.00 MByte) ------------------------------------------------------------ [ 12] local 160.91.212.99 port 49662 connected with 134.79.240.36 port 5001 [ 3] local 160.91.212.99 port 49653 connected with 134.79.240.36 port 5001 [ 5] local 160.91.212.99 port 49655 connected with 134.79.240.36 port 5001 [ 4] local 160.91.212.99 port 49654 connected with 134.79.240.36 port 5001 [ 11] local 160.91.212.99 port 49661 connected with 134.79.240.36 port 5001 [ 6] local 160.91.212.99 port 49656 connected with 134.79.240.36 port 5001 [ 8] local 160.91.212.99 port 49658 connected with 134.79.240.36 port 5001 [ 9] local 160.91.212.99 port 49659 connected with 134.79.240.36 port 5001 [ 10] local 160.91.212.99 port 49660 connected with 134.79.240.36 port 5001 [ 7] local 160.91.212.99 port 49657 connected with 134.79.240.36 port 5001 [ ID] Interval Transfer Bandwidth [ 11] 0.0-60.3 sec 146 MBytes 20.3 Mbits/sec [ 5] 0.0-60.5 sec 134 MBytes 18.6 Mbits/sec [ 12] 0.0-60.5 sec 99.6 MBytes 13.8 Mbits/sec [ 9] 0.0-60.7 sec 132 MBytes 18.3 Mbits/sec [ 3] 0.0-60.8 sec 178 MBytes 24.6 Mbits/sec [ 7] 0.0-60.8 sec 173 MBytes 23.9 Mbits/sec [ 10] 0.0-60.8 sec 223 MBytes 30.8 Mbits/sec [ 8] 0.0-61.0 sec 218 MBytes 29.9 Mbits/sec [ 6] 0.0-61.3 sec 183 MBytes 25.0 Mbits/sec [ 4] 0.0-62.3 sec 161 MBytes 21.7 Mbits/sec [SUM] 0.0-62.3 sec 1.61 GBytes 222 Mbits/secFrom SLAC to ORNL (the ORNL max receive window was set to 1MByte):
4cottrell@iepm-resp:~>iperf -c wrw.ornl.gov -w 1m -t 60 -i 5 ------------------------------------------------------------ Client connecting to wrw.ornl.gov, TCP port 5001 TCP window size: 20.0 MByte (WARNING: requested 10.0 MByte) ------------------------------------------------------------ [ 3] local 134.79.240.36 port 43536 connected with 160.91.212.99 port 5001 [ 3] 0.0- 5.0 sec 59.8 MBytes 100 Mbits/sec [ 3] 5.0-10.0 sec 20.0 MBytes 33.5 Mbits/sec [ 3] 10.0-15.0 sec 15.0 MBytes 25.2 Mbits/sec [ 3] 15.0-20.0 sec 9.90 MBytes 16.6 Mbits/sec [ 3] 20.0-25.0 sec 14.9 MBytes 24.9 Mbits/sec [ 3] 25.0-30.0 sec 15.0 MBytes 25.1 Mbits/sec [ 3] 30.0-35.0 sec 14.9 MBytes 25.0 Mbits/sec [ 3] 35.0-40.0 sec 9.90 MBytes 16.6 Mbits/sec [ 3] 40.0-45.0 sec 4.96 MBytes 8.32 Mbits/sec [ 3] 45.0-50.0 sec 14.9 MBytes 25.0 Mbits/sec [ 3] 50.0-55.0 sec 10.1 MBytes 16.9 Mbits/sec [ 3] 55.0-60.0 sec 4.98 MBytes 8.35 Mbits/sec [ 3] 0.0-60.3 sec 194 MBytes 27.0 Mbits/secIt is seen that the slow start ramps up to about 100Mbits/s but then the throughput is limited to 10-30Mbits/s. Using multiple (8) parallel streams with a 1MByte window we got about 100 Mbits/s.
The reason we get higher iperf throughput from ORNL to SLAC than vice versa maybe due to the asymmetric utilization of the inbound (to SLAC) and outbound links at the SLAC border seen here. This shows that the outbound link is sustaining 500-700Mbits/s while the inbound links is typically seeing less than 100Mbits/s.
The flows are all very short (the maximum number of uni-directional packets in a flow is 172, the average 141). These flows are short enough that TCP over such a long RTT link never gets out of startup. The bandwidth * delay product for a 100Mbits/s bottleneck on a 70 ms RTT is about 600 1500 Byte segments. The distribution of the number of packets in a flow from ORNL to SLAC is multi-modal with two major peaks at about 104 Bytes and 31000 Bytes. For ORNL to SLAC it is also mainly bi-modal with peaks at 104 Bytes and 174000 Bytes (166 packets). For 166 packets on such a link TCP is still in "slow_start", so the best one could hope for in TCP throughput when one doubles the number of packets in flight for each RTT, would be about 3Mbits/s.
Since the flows come in groups that are closely separated (e.g. ~60 separate flows from ORNL to SLAC in a 431 second interval) we also looked at the aggregate uni-directional throughput for all the flows in a group. To do this we took the total number of Bytes sent in a group of flows from ORNL to SLAC and divided it by the time between the start of the first flow and the end of the last flow. This yielded ~ 24 MBytes transferred in 460 seconds in about 60 flows for a total of typical throughput of 7.5kbits/s.