nanog mailing list archives

Re: best practice for advertising peering fabric routes

From: Clay Fiske <clay () bloomcounty org>
Date: Wed, 15 Jan 2014 11:33:57 -0800

On Jan 15, 2014, at 10:26 AM, William Herrin <bill () herrin us> wrote:


Of course working, monitorable and testable are three different
things. If my NMS can't reach the IXP's addresses, my view of the IXP
is impaired. And "the Internet is broken" is not a trouble report that
leads to a successful outcome with customer support... it helps to be
able to pin things down with some specificity.

This approach concerns me for a number of reasons.

First, having your NMS ping your upstream’s IXP peers probably doesn’t scale. If I’m a peer of a reasonably large
provider, I’m pretty sure I don’t want all their customers hammering my management plane. Even if you’re the only one
doing it, you also don’t know if I’m rate-limiting pings for that or any other reason.

Second, what information do you get that you didn’t already have? If you saw the IP in a traceroute then you know it
exists, is alive, is in the path, and a rough estimation of the latency. Pinging it may even give you negative
information. Platforms vary and all, but in my experience pinging a router, especially a potentially busy one peering
at an IXP, shows notably worse performance than “real” traffic experiences (admittedly somewhat true of TTL Expired
responses, but less so in my experience). Now you’re potentially seeing high latency and packet loss which in reality
might not even be there at all.

Third, you don’t know that your ping to the peering IP is even taking the same path as the packets addressed to the
real destination. MTR for example looks nice, but it would probably be more accurate if it simply ran the traceroute
over and over instead of pinging each hop directly. You would also detect path changes for the real destination that
pinging intermediate hops wouldn’t show you.

While I appreciate the desire to be able to do as much of your own detective work as possible, I can also see where
you’re now shifting workload onto someone else’s support organization when they’re not necessarily the problem either
(“Hey, my NMS says your peering router is causing latency and packet loss, fix it!”).

I’m also not saying there isn’t a troubleshooting gap caused by this. I’m just not sure being able to ping the IXP hop
solves that problem either.

Semi-related tangent: Working in an IXP setting I have seen weird corner cases cause issues in conjunction with the IXP
subnet existing in BGP. Say someone’s got proxy ARP enabled on their router (sadly, more common than it should be, and
not just from noobs at startups). Now say your IXP is growing and you expand the subnet. No matter how much you harp on
the customers to make the change, they don’t all do it at once. Someone announces the new, larger subnet in BGP. Now
when anyone ARPs for IPs in the new part of the range, proxy ARP guy (still on the smaller subnet) says “hey I have a
route for that, send it here”. That was fun to troubleshoot. :)

-c

Current thread:

Re: best practice for advertising peering fabric routes, (continued)
- - - Re: best practice for advertising peering fabric routes Leo Bicknell (Jan 15)
    - Re: best practice for advertising peering fabric routes Dobbins, Roland (Jan 15)
    - Re: best practice for advertising peering fabric routes Saku Ytti (Jan 15)
    - Re: best practice for advertising peering fabric routes Martin Pels (Jan 18)
    - Re: best practice for advertising peering fabric routes Jim Shankland (Jan 15)
    - Re: best practice for advertising peering fabric routes Joe Abley (Jan 15)
    - Re: best practice for advertising peering fabric routes Niels Bakker (Jan 15)
    - Re: best practice for advertising peering fabric routes Christopher Morrow (Jan 15)
    - Re: best practice for advertising peering fabric routes William Herrin (Jan 15)
    - Re: best practice for advertising peering fabric routes Michael Still (Jan 15)
    - Re: best practice for advertising peering fabric routes Clay Fiske (Jan 15)
    - Re: best practice for advertising peering fabric routes Niels Bakker (Jan 15)
    - Proxy ARP detection (was re: best practice for advertising peering fabric routes) Clay Fiske (Jan 15)
    - Re: Proxy ARP detection Niels Bakker (Jan 15)
    - Re: Proxy ARP detection Clay Fiske (Jan 15)
    - Re: Proxy ARP detection Niels Bakker (Jan 15)
    - Re: Proxy ARP detection Clay Fiske (Jan 15)
    - Re: Proxy ARP detection Eric Rosen (Jan 15)
    - Re: Proxy ARP detection Patrick W. Gilmore (Jan 15)
    - Re: Proxy ARP detection Jimmy Hess (Jan 15)
    - Re: Proxy ARP detection Vlade Ristevski (Jan 16)

(Thread continues...)