nanog mailing list archives

Re: FW: Reliability of looking glass sites / rviews


From: Tim Evens <tim () snas io>
Date: Fri, 15 Sep 2017 07:45:12 -0700



You didn't mention details about which ASN or prefixes you were
checking. Are you referring to ASN 14607 that only advertises two
prefixes 129.77.0.0/16 and 2620:0:2810::/48?

Based what we see over the weekend (using routeviews data), we see:

Event Start Time: 2017-09-09 11:29:23 UTC (2017-09-09 07:29:23 EDT)
Event End Time: 2017-09-09 13:31:30 UTC (2017-09-09 09:31:30 EDT)

Are the above times correct?

We see the routes withdraw and then come back. For example:
http://demo-rv.snas.io:3000/dashboard/db/prefix-history?orgId=2&var-prefix=129.77.0.0&var-prefix_len=16&var-asn_num=All&var-router_name=All&var-peer_name=All&from=1504908000000&to=1505203200000

When you checked routeviews, which router and peer were you looking at?
When you did a "show ip bgp ..." did you include the prefix length? If
not, it would have then shown you 0/0 or 128/5, depending on which
router you were on.

--Tim 

On 9/13/17, 8:43 AM, "NANOG on behalf of Matthew Huff"
<nanog-bounces () nanog org on behalf of mhuff () ox com> wrote:

Both should have been similar.

In the first case we lost power to all of our BGP border routers that
are peered with the upstream providers
In the second case, I did an explicit "shut" on the interface connected
to the upstream provider that appeared "stuck" after an hour after the
outage.

From: <christopher.morrow () gmail com> on behalf of Christopher Morrow
<morrowc.lists () gmail com>
Date: Wednesday, September 13, 2017 at 10:58 AM
To: Matthew Huff <mhuff () ox com>
Cc: nanog2 <nanog () nanog org>
Subject: Re: Reliability of looking glass sites / rviews

On Wed, Sep 13, 2017 at 5:30 AM, Matthew Huff
<mhuff () ox com<mailto:mhuff () ox com>> wrote:
This weekend our uninterruptible power supply became interruptible and
we lost all circuits. While I was doing initial debugging of the problem
while I waited on site power verification, I noticed that there was
still paths being shown in rviews for the circuit that were down. This
was over an hour after we went hard down and it took hours before we
were back up.

explicit vs implicit withdrawals causing different handling of the
problem routes?

I worked with our providers last night to verify there weren't any
hanging static routes, etc... We shut the upstream circuit down and
watched the convergence and saw that eventually all the paths
disappeared. Given what we saw on Saturday, what would cause route-views
to cache the paths that long? Some looking glass sites only show what
they are peered with or at most what their peers are peered with, that's
why I've always used route-views.

What looking glass sites other than route-views would people recommend?

ripe ris.


Current thread: