Loss of connectivity from BINP/Novosibirsk to SLAC Network logo

Les Cottrell. Page created: April 18, 2005

Central Computer Access | Computer Networking | Network Group | More case studies
SLAC Welcome
Highlighted Home
Detailed Home


Serge Belov of BINP Novosibirsk reported by the following by email around 6:05am: April 18, 2005

Our physics noted that since some time we haven't connectivity to slac. Our packets are going via main internet connection and I guess slac is returned replies via KEK channel. Due to asymmetry of this traffic it gets blocked on external firewall.

Our router shows in "sh ip rou" that slac route is absent within routes we get from KEK, while other ESNet routes are still there, like www.es.net, www.bnl.gov, www.fnal.gov. gets blocked on external firewall.

mx:belov {102} ping www.slac.stanford.edu PING www8.slac.stanford.edu ( 56 data bytes
64 bytes from icmp_seq=0 ttl=243 time=300.086 ms
64 bytes from icmp_seq=1 ttl=243 time=297.625 ms
64 bytes from icmp_seq=2 ttl=243 time=297.588 ms
--- www8.slac.stanford.edu ping statistics ---
4 packets transmitted, 3 packets received, 25% packet loss round-trip min/avg/max/std-dev = 297.588/298.433/300.086/1.168 ms
Thus we see that ICMP are getting through while rtt are not usual 200ms.
mx:belov {103} traceroute !$
traceroute www.slac.stanford.edu
traceroute to www8.slac.stanford.edu (, 64 hops max, 40 byte packets
 1  rtc-gw (  0.652 ms  0.761 ms  0.557 ms
 2  NSC-FO-c3550-INP.nsc.ru (  0.957 ms  1.19 ms  1.17 ms
 3  s3550-12a-unknown-nsc.sbras.ru (  6.607 ms  1.55 ms  0.983 ms
 4  s3750-48a-ge.sbras.ru (  0.874 ms  1.64 ms  0.746 ms
 5  r7206-ge.sbras.ru (  2.498 ms  2.1 ms  2.43 ms
 6 (  3.271 ms  2.204 ms  4.837 ms
 7  SM-TCMS5-RBNet-2.RBNet.ru (  34.949 ms  33.875 ms  34.694 ms
 8  MSK-M9-RBNet-7.RBNet.ru (  50.952 ms  50.707 ms  50.800 ms
 9  AMS-RBNet-1.RBNet.ru (  91.85 ms  90.67 ms  89.725 ms 10  Chicago-RBNet-1.rbnet.ru (  194.388 ms  193.805 ms  194.564 ms
11  chi-gev156-naukanet.es.net (  249.276 ms  248.924 ms  249.639 ms
12  chicr1-ge0-chirt1.es.net (  249.488 ms  248.933 ms  264.646 ms
This trace shows that our probes are indeed going via RBNet making mentioned asymmetry.
binp-gw>sh ip ro www.slac.stanford.edu
% Network not in table
This proves that network is really not is the table of our router, while others nevertheless are.
binp-gw>sh ip ro www.es.net           
Translating "www.es.net"...domain server ( [OK]
Routing entry for, supernet
  Known via "bgp 5402", distance 20, metric 0
  Tag 2505, type external
  Last update from 4d07h ago
  Routing Descriptor Blocks:
  *, from, 4d07h ago
      Route metric is 0, traffic share count is 1
      AS Hops 4
binp-gw>sh ip ro www.fnal.gov
Translating "www.fnal.gov"...domain server ( [OK]
Routing entry for
  Known via "bgp 5402", distance 20, metric 0
  Tag 2505, type external
  Last update from 4d07h ago
  Routing Descriptor Blocks:
  *, from, 4d07h ago
      Route metric is 0, traffic share count is 1
      AS Hops 4
binp-gw>sh ip bgp sum
BGP router identifier, 
local AS number 5402 BGP table version is 28506, 
main routing table version 28506
156 network entries and 156 paths using 20748 bytes of memory
44 BGP path attribute entries using 2640 bytes of memory
26 BGP AS-PATH entries using 656 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory
14 BGP filter-list cache entries using 168 bytes of memory BGP activity 12682/12526 prefixes, 14219/14063 paths, scan interval 60 secs
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd 4  2505  153172  150541    28506    0    0 4d07h         131   4  2683       0       0        0    0    0 never    Active  4  8756       0       0        0    0    0 never    Active  4  8756       0       0        0    0    0 never    Active  4  5387       0       0        0    0    0 never    Active  4  5387  159002  153597    28506    0    0 1w6d           24

binp-gw>ping www.kek.jp
Translating "www.kek.jp"...domain server ( [OK]
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to, timeout is 2 seconds:
Success rate is 100 percent (5/5), round-trip min/avg/max = 104/116/128 ms
Hope you'll look into the problem. Thank you, Serge Belov

Followup and Analysis

Pings from SLAC to rainbow.inp.nsk.su work OK:
4cottrell@noric03:~>ping rainbow.inp.nsk.su
PING rainbow.inp.nsk.su ( 56(84) bytes of data.
64 bytes from rainbow.inp.nsk.su ( icmp_seq=0 ttl=238 time=298 ms
64 bytes from rainbow.inp.nsk.su ( icmp_seq=1 ttl=238 time=299 ms
64 bytes from rainbow.inp.nsk.su ( icmp_seq=1288 ttl=238 time=297 ms

--- rainbow.inp.nsk.su ping statistics ---
1290 packets transmitted, 1289 received, 0% packet loss, time 1301545ms
rtt min/avg/max/mdev = 296.975/299.671/417.648/8.781 ms, pipe 2
The host is not reachable from SLAC by traceroute:
1cottrell@noric06:~>traceroute rainbow.inp.nsk.su
traceroute to rainbow.inp.nsk.su (, 30 hops max, 38 byte packets
 1  rtrg-farm0 (  0.243 ms  0.162 ms  0.157 ms
 2  rtr-dmz1-ger (  0.224 ms  0.197 ms  0.199 ms
 3  slac-rt4.es.net (  0.291 ms  0.255 ms  0.249 ms
 4  snv-pos-slac.es.net (  0.625 ms  0.651 ms  0.594 ms
 5  chicr1-oc192-snvcr1.es.net (  48.748 ms  48.728 ms  48.672 ms
 6  aoacr1-oc192-chicr1.es.net (  68.709 ms  68.691 ms  68.741 ms
 7  aoapr1-ge0-aoacr1.es.net (  68.787 ms  68.787 ms  68.752 ms
 8 (  181.134 ms  181.039 ms  181.067 ms
 9  keksw2-ns.kek.jp (  181.052 ms  181.051 ms  181.036 ms
10  kekcis7.kek.jp (  186.412 ms  186.347 ms  186.556 ms
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
However ICMP based traceroutes get through OK.
2cottrell@noric06:~>traceroute -I rainbow.inp.nsk.su
traceroute to rainbow.inp.nsk.su (, 30 hops max, 38 byte packets
 1  rtrg-farm0 (  0.269 ms  0.174 ms  0.160 ms
 2  rtr-dmz1-ger (  0.226 ms  0.199 ms  0.196 ms
 3  slac-rt4.es.net (  0.293 ms  0.256 ms  0.253 ms
 4  snv-pos-slac.es.net (  0.637 ms  0.598 ms  0.666 ms
 5  chicr1-oc192-snvcr1.es.net (  48.678 ms  49.454 ms  48.719 ms
 6  aoacr1-oc192-chicr1.es.net (  68.734 ms  68.700 ms  68.781 ms
 7  aoapr1-ge0-aoacr1.es.net (  68.828 ms  68.783 ms  68.745 ms
 8 (  181.043 ms  181.144 ms  181.218 ms
 9  keksw2-ns.kek.jp (  181.115 ms  181.053 ms  181.122 ms
10  kekcis7.kek.jp (  186.574 ms  186.347 ms  186.415 ms
11  * * *
12  rainbow.inp.nsk.su (  297.008 ms  297.259 ms  299.242 ms
Looking at the
PingER data there is no obvious change in losses or RTT.

Looking at the traceroutes measured by IEPM-BW it appeared we lost traceroute connectivity around between 4:33am and 4:43am April 15th, 2005.

KEK routes

Joe Burrescia of ESnet sent email to KEK at 9:14am saying: Hello Yamagata-San,

It appears our colleagues in Novosibirsk are not seeing a particular route ESnet announces to KEK at our New York peering. The route ( is to our customer SLAC. It does appear that we are announcing the route to


joeb@aoa-pr1-re0> show route advertising-protocol bgp ...
*           Self                                    3671 3671 I

Could you please verify that this route is being propagated?

Thank you so much for your attention to this.
Yamagata sent email at 6:53pm:

Indeed, we are receiving it.
#sh ip bgp nei rece | incl
*                        0 293 3671 3671 i

But our upper network SINET (AS2907) announces different one.
#sh ip bgp | incl   
* i134.79.0.0             10    100      0 2907 2153 32 3671 i

        |                |
        |                |
        |                |
      ESnet            AS2153
        |                |
        |                |(C)
        |                |
        |      (B)       |
        |                       |
        +---(IP tunnel)---------+

(A) is used from SLAC to BINP and
(B) is used from BINP to SLAC.

At this time,
(C) is used from KEK to SLAC and
(B) is used from SLAC to KEK.

If necessary, we can announce (C) to BINP,
(A) will be used from SLAC to BINP and
(C) will be used from BINP to SLAC.
Should we announce (C) to BINP? Or if this asymmetric route is too harmful, we will change the tunnel endpoint at our side, and send packets from BINP to ESnet into (A).

Firewall Issues

Email from Joe Burrescia at 7:30pm:

Hello Yamagata-San,
Thank you for your diagram and explanation, very helpful. Unfortunately, I do not know how BINP implemented their firewall for blocking routes, so I can't say which of your options will work. Perhaps Serge could comment on what he would like to see from KEK.

Email from Serge Belov 9:36pm:

Hello Joe, Jamagata-San,
I'm afraid our US colleagues will not receive this message immediately so I'll duplicate if from another account, sorry for dups.

Thank you for the investigation. As I wrote my yesterday's messages under pressure I missed the detailed explanation of such things as local firewalling, though already had a lot of troubles with it.

We have the following configuration:

BINP---FWa--+ Ext GW |----->Link to KEK
                +--->Our Default Link-----FWb-->to Global Internet
                      SB RAS (AS5387)
Note here are two FWs in place - the local one operated within our network and the other, protecting the upper network, AS5387, belonging to the SB RAS.

This second one is operating in stateful more so it creates and remembers the state for each TCP and UDP conversation established through it. Thus no TCP/UDP exchange is possible when there is an assymetry on Ext GW. The firewall is not filtered specifically any routing information, and is almost invisible for most of us except the cases like the current one, when the whole connectivity is broken.

There are several additional features/holes in filtering policy on that FWb - for example, few our internal machines (rainbow.inp.nsk.su in particular) are excluded from filtering ruleset so they are at least pingable from outside while most other machines of SB RAS and our LAN are tightly filtered there (I guess).

Last fall I discovered almost complete blocking of few ESNet network just due to such setup and wrong routing on their side. For example, the ORNL network preferred to use their default to contact us instead of the route they should have received from KEK. We still used our specific route to ORNL received from KEK and thus all communication with ORNL and one or two another networks from the whole set KEK is translating to us was blocked. I tried to explain this problem to ORNL experts but failed, gave up and just installed at Ext GW a specific route for ORNL pointing to our default thus restoring symmetry broken by them.

It seems that the whole traffic between ORNL and BINP is negligable, so such solution was acceptable, but fixing the current problem in such a way would be, I think, a wrong solution, as SLAC is one of our important partners, keeping care of the KEK-BINP link all the time, and it should not be prevented from using this channel.

It could be understood that some networks on the ESNet side might introduce similiar problems locally, exactly like ORNL, and these problems could be solved, though in such inelegant way. I don't understand why the routing could be changed in such a specific way on a intermediate, transit system? Why the routing was changed just for SLAC and not for others? Or we'll discover in a few days, like we discovered the case with ORNL, that some other networks are also unaccessible in just the same way as SLAC?

Nevertheless thank all of you for your efforts in investigation, hope the problem will be fixed today.


Possible Solutions

EMail from Yamagata 11:36pm:

Currently the tunnel router and the serial router are different. If you prefer, we change the tunnel endpoint on the router, that is connected BINP via serial line. It makes that all routes announced from ESnet will be announced to BINP and symmetric route.

This "symmetric" means packets from or to BINP will be sent to the tunnel directly, and maybe it makes worse performance. Because the serial router doesn't use MTU lager than 1500. Current tunnel router uses MTU 1550 because of the encapsulation overhead.

I am thinking two plans.

  1. announce SLAC route of AS2153 to BINP Quick but asymmetric, and how about 2153 AUP? Is it permitted? I want to get information.
  2. tunnel endpoint change Symmetric, but router re-configuration at ESnet and KEK and maybe worse performance.
Email from Joe Burrescia 8:30am 4/20/05:
I guess I would prefer your option 2) if BINP can live with the 1500 Byte MTU. Seems like this is a cleaner solution and may take care of problems to other ESnet sites, like the one Serge has experienced with ORNL.

Email from Yamagata, Wed 20 April 12:28pm:
If 2) is choosen, please use as the address of endpoint at KEK.

Could you please keep the old tunnel between and until BGP works via new tunnel? I will wait for the new address assignment of new tunnel.

Email from Joe Burrescia Wed 20 April 2:57pm"
OK, I've set our end up. Our tunnel endpoint stays the same I've created a /30 for the tunnel, our end is your end is I've set up our end of the bgp session in passive mode, so once you configure your end it should jusr come up. (Our bgp source address is I've left the original tunnel up, per your request, until the new tunnel is operational. Please let me know if there is anything else we can do.

Email from Yamagata Wed April 20, 3:22pm:
Thanks, I make up the tunnel and BGP connection just now. SLAC route is announced to BINP,

kekcis7#sh ip bgp nei advertised-routes | include 3671
*>        0 293 3671 3671 i
*>        0 293 3671 3671 i
How about the current status from BINP?

Email from Yukio Karita Wed 20 April 7;10pm:
Les, Joe, and Serge, One thing you should awawe is that the 1500Bytes MTU limit is not for the user packets but for the encapsulated tunnel packets. The MTU limit for the user packets will be less than 1500Bytes. This means that the usual 1500Bytes user packets will be fragmented at the ingress router for the tunnel and will make the performance worse. Even worse, if some applications use Path MTU Discovery and ICMP packets are blocked at some firewalls, some packets cannot be transfered, without no error message. Best regards, Yukio

Email from Serge Belov 5:40am 4/26/05:
Thank you for your efforts, it seems everything is working now.
This episode was not as routine as I wanted it to be, as the whole last week I spent in Moscow and had to investigate and warn everyone in such a remote mode, having no access to my mails, tools etc. Just now it seems everything working almost OK, but few issues remain to be solved.
May be this is due to reduced MTU and clearing DF bit - we observe rather strange behavior of tcp sessions. For interactive ones everything works, but for bulk transfers sessions are stalled. I saw a lot of packets with df-bit set, and guess that clearing it may be harmful - need to investigate further.
Current trace from BINP is given below. More details tomorrow, sorry. S.

mx:belov {142} traceroute www.slac.stanford.edu traceroute to www8.slac.stanford.edu (, 64 hops max, 40 byte packets
 1  rtc-gw (  0.615 ms  0.572 ms  0.552 ms
 2 (  103.863 ms  103.633 ms  103.879 ms
 3  aoa-t1-kek.es.net (  282.556 ms  282.588 ms  282.530 ms
 4  aoacr1-ge0-aoapr1.es.net (  282.604 ms  282.558 ms  282.607 ms
 5  chicr1-oc192-aoacr1.es.net (  302.350 ms  302.646 ms  302.270 ms
 6  snvcr1-oc192-chicr1.es.net (  350.339 ms  350.381 ms  350.557 ms
 7  slac-pos-snv.es.net (  350.655 ms  350.938 ms  350.863 ms
 8  rtr-dmz1-vlan400.slac.stanford.edu (  350.969 ms  350.836 ms  350.897 ms
 9  * * *
10  *^C
At 05:47 AM 4/26/2005, yamagata@nwgpc.kek.jp wrote:
>In message <20050425223421.991D824FFD@beagle.es.net>,
>Joe Burrescia wrote
>> Thank you, I now see the route via the new 
>> Do you believe it is possible to increase the mtu of the tunnel to 
>> match our end at 1550 bytes?
>No, our serial router can't speak MTU larger than 1500 and the actual 
>MTU via tunnel decreases.
>That is why I asked about two plans.
>I want to hear about the current status from SLAC and BINP people.
>Is it out of use?
Email from Yamagato, 6:28am 4/26/05:
This morning I wrote the previous mail and found it wasn't delivered to BINP. I waited a few hours but it still remained. I felt the DF-clear configuration may be bad, and removed the DF-clear configuration from the router. Now, I restored the DF-clear configuration. If this mail is delayed also, I will remove it again.

Email from Yamagat, 6:32am 4/26/06:
At this time, it seems to work and I keep it. The DF-clear configuration is active.

Email from Joe Metzger, 8:27am, 4/26/05:
I don't fully understand your reasons for clearing the DF bit. I thought that the 'right' answer is to drop 'large' packets with the DF bit set and generate a ICMP Fragmentation Needed message back to the source so that Path MTU discovery works as specified in RFC 1191. Or, are you assuming that path MTU doesn't work due to ICMP filtering, broken TCP/IP stacks, or other reasons so we just have to go ahead and fragment any large packets we see to make it work?

Email from Yamagata 3:15pm, 4/26/05:
Thanks for the comment. > I thought that the 'right' answer is to drop 'large' packets with the > DF bit set and generate a ICMP Fragmentation Needed message back to > the source so that Path MTU discovery works as specified in RFC 1191. OK. I disable the DF-clear configuration, and wait for the comment from BINP just now.

Jerrod Williams of SLAC sent email 2:30pm 4/26/05:
We are having trouble running Iepm-ABWE to your 'rainbow' machine. This problem began on yesterday, April 25 at sometime after 2:47PM PST. I am suddenly seeing "Connection Refused" when I try to run Iepm-ABWE tests (abing) to your machine on port 8176. Telnets to this port return:

jerrodw@nettest2 $ telnet rainbow.inp.nsk.su 8176 
telnet: connect to address Connection refused 
Can you verify that this port is open to be accessed from our machines at SLAC (nettest2.slac.stanford.edu[] and iepm-resp.slac.stanford.edu[]). We cannot run our full tests to your site with Iepm-ABWE tests failing. Can you please investigate?

Email from Les Cottrell 5:06 pm 4/26/05:
This may be a spin off of the new route through KEK the 1500 byte limitation, and the setting of the DF (don't fragment bit), see http://www.slac.stanford.edu/grp/scs/net/case/binp-apr05/
ABwE uses a UDP payload of 1450 Bytes. Adding on 8 Bytes of UDP header plus 20 Bytes of IP header gets us up to 1478 Bytes which is indeed what Netflow in the router reports.
Yamagata indicates that he has changed the routing recently to use a route that only allows a 1500Byte MTU. It used to be 1550 Bytes. Yukio Karita also points out that the MTU limit for user packets on this new route is < 1500Bytes.
It is easy for us to reduce the size of the UDP payload. We have reduced it to 1450Bytes but we are still seeing problems. Thus it may have nothing to do with the new route/MTU. On the other hand it maybe a port blocking probelm. ABwE uses port 8176 and as your mail indicates we cannot connect to this port via telnet (which I would expect to use small packets). So this sounds more like a port blocking effect than a MTU problem.

Email from Yamagata 4/26/04 7:20pm:
I has stopped announce of BINP routes into new tunnel, and packets from BINP to ESnet are sent to SINET (AS2907) same as the former condition. Even though it may violates the AUP of AS2153. Newer tunnel is used only for feeding ESnet routes to BINP.

Email from Yukio Karita, 4/26/05 10:11pm:
Les, Serge, Joe, and all, You may have already known this, but I'd like to explain about the problem.

  1. This trouble began when SINET (NII) started peering with CalREN this month. NII establisehd a new OC48 link connecting Tokyo and Los Angels at the end of last month, in addtion to a OC192 link connecting Tokyo and New York.
  2. The route for SLAC is announced not only to ESnet but also to CalREN. Until last month, the route for SLAC had been given to KEK only via ESnet and SINET, and there had been no problem.
  3. KEK expected to receive the routes for all ESnet laboratories only via ESnet and SINET, and via MANLAN. This is to utilize the 10Gbps bandwidth established by ESnet's 10GE MANLAN connection. I believe that the bandwidth is more important than the RTT for our applications.
  4. The KEK-ESnet tunnel for the BINP-ESnet traffic has been used only for the ESnet-->BINP traffic. This has worked without any problem. KEK has provided to BINP the routes given from ESnet, but has not provided any other foreign routes. This is, per BINP's request, to exclude all the foreign traffic from the KEK-BINP link.
  5. This week, KEK started to announce to BINP the route for SLAC given via CalREN. So the traffic is now asymmetric; SLAC-->BINP is via the KEK-ESnet tunnel via New York, while BINP-->SLAC is via CalREN. I believe this asymmetry doesn't cause any problem.
  6. If you want to have the BINP-SLAC traffic via CalREN in order to get shorter RTT, it can be made by having another tunnel connecting KEK and SLAC directly. This tunnel will be used only for the SLAC-->BINP traffic, and will not have the MTU problem. DO you want this?
Best regards, Yukio

Email from Serge 4/27/05 1:42am:
I try to keep the rainbow.inp.nsk.su as open as possible, almost no firewalling at all (or even exactly no filtering). So if you got "connection refused" diagnostic it should be treated as it is - no one is listening here on this port. I don't know what program should do it - if there was some daemon you've started before, it was died certainly and should be restarted. Tell me what program to start and I'll do it, or take a look on diagnostic if there is any - why it was died, when etc. No any specific actions on our side were done before.

There is no any program listening on that port, and refusing connection is a valid response. When I try to simulate the listening program, everything works ok in both directions after opening tcp connection:

rainbow$ nc -l 8176

> telnet rainbow.inp.nsk.su 8176
Connected to rainbow.inp.nsk.su.
Escape character is '^]'.
Email from jerrod Williams 4/27/05 3:11pm:
The problem I was seeing on yesterday has been resolved. The daemon was restarted late yesterday afternoon after Connie recompiled abing on the 'rainbow' machine after making updates to it to adjust the packet sizes being used in abing. Thanks to all who offered help.

Email from Yamagata 4/27/05 2:08pm:
Current state is DF is enabled, the route is asymmetric. Traffic from BINP to SLAC uses AS2907 and AS2153 link. Traffic from SLAC to BINP uses old ESnet tunnel.

Email from Serge 4/27/05 4:00am:
Thank you for the whole bunch of labor of fixing routing thingies. As I explained last week we are afraid of the assymetry near our end where traffic might go via angry firewall and get blocked due to being an unidurection one.
Asymmetry at a relatively far end is not a problem if there are no such strict filtering as that we have here.
Some questions still might come to mind -

Email from Yamagata 4/27/05 9:11am:
Indeed, the route is asymmetric between ESnet and KEK, but there is no asymmetry at BINP tri-head router. Traffic between SLAC and BINP uses BINP-KEK link in both directions and there is no asymmety at BINP router.
> - is SLAC the only victim of the reform?
Hopefully, but I'm not sure. Even though, now all routes from ESnet are fed to BINP. We directly receive them via new tunnel, and it is no longer affected by the announce from our upper network.
> - in the case there are some other sites injured by this reform,
>   are the measures taken at KEK sufficient to restore our interactions
>   with them?
So if you still find some unreachable sites, please notice.

Email from Cottrell 4/27/05 6:48pm:
Yukio, Thanks for the explanation. It would seem to me that with the limitation on bandwidth from KEK to BINP of 512kbits/s if there are no problems with MTU then BINP will not be able to tell the difference between an OC48 (via CalREN) and an OC192 (via NY) , unless one is much more congested than the other. I suppose if we wanted to be exhaustive we could make tests of the two routes (we actually have measuremenst on the NY route from the last few months). So, my preference would be to try out the shorter RTT, watch it for a couple of weeks, make some measurements, and all things being roughly equal then stick with the it (shorter RTT via CalREN). However, I bow to the major user, i.e. Serge to make the final decision.

Page owner: Les Cottrell