nanog mailing list archives

Re: Google peering pains in Dallas


From: Jared Mauch <jared () puck nether net>
Date: Thu, 30 Apr 2020 14:09:33 -0400



On Apr 29, 2020, at 7:59 PM, Kaiser, Erich <erich () gotfusion net> wrote:

So it has been 3 weeks of major ICMP packet loss to any google service over the Dallas Equinix IX, it is not 
affecting performance of service but is affecting us with customer complaints and service calls due to some software 
using it for monitoring purposes people using it for benchmark testing.  I have been told from them that they know 
the cause now and know that a Large ISP on the IX is causing the issue(Hmm wonder who that is...), so why do they not 
shutdown the peer with them and force the ISP to fix the issue?  This issue is affecting everyone on the IX not just 
us, very very frustrating.  Hopefully this will reach someone over there that can do something about it….

Issues with the IXP ecosystem aren’t new in the US.  This is why some providers don’t appear at them.  The original one 
member could hurt it all was really the gigaswitch HOLB (head of line blocking) issue that was triggered by congested 
ports.

(Waits for others to crawl out of the woodwork who were more involved in this :-) 

This is why the majority of traffic volume for interconnection has generally been over private peering links (paid, 
SFI, otherwise).

If you tried to force it through an IXP ecosystem the tens of Tbps wouldn’t fit even in each city.  Things like CDNs, 
the Netflix OpenConnect and otherwise have really shifted the demand off the interconnection points as much as 
feasible.  Sometimes an organization can’t handle it or tries to cling to it’s old ways.  Sometimes it takes 
organization change or people change to improve the situation.

I know it can sound like a broken record, but upgrading to match the capacity demands really can make a difference to 
offload paths.  It may also expose other weak points.  My personal goal is to cease thinking about things in the 95/5 
model and more of a peak model.  95/5 gets you so far but the peaks are really where networks can shine or show their 
age.

I understand it’s not always possible to upgrade links, or sometimes one party holds out on the other.  It’s certainly 
not the case at $dayjob and I try to ensure the process works as best as it can here.  

Sometimes it’s best to just de-peer a network.  You may find it works out better for all involved.

At $nightJob I want to peer as much traffic off as possible, but if the network paths aren’t there or low-speed it may 
not make sense.

Evaluate your peers periodically to ensure you are getting what you expect.

- jared

Current thread: