nanog mailing list archives

RE: FW: Reliability of looking glass sites / rviews


From: Matthew Huff <mhuff () ox com>
Date: Sat, 16 Sep 2017 09:48:32 +0000

ASN 14607, and 129.77.0.0/16

After slightly over an hour after our power event where 100% of our equipment was down, this is what I saw at routeviews

BGP routing table entry for 129.77.0.0/16, version 24978989
Paths: (7 available, best #7, table default)
  Not advertised to any peer
  Refresh Epoch 1
  134708 3491 6939 46887 14607
    103.197.104.1 from 103.197.104.1 (123.108.254.70)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  3333 1273 6939 46887 14607
    193.0.0.56 from 193.0.0.56 (193.0.0.56)
      Origin IGP, localpref 100, valid, external
      Community: 1273:23000
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  8283 57866 6762 6939 46887 14607
    94.142.247.3 from 94.142.247.3 (94.142.247.3)
      Origin IGP, metric 0, localpref 100, valid, external
      Community: 6762:33 6762:16500 8283:15 57866:105
      unknown transitive attribute: flag 0xE0 type 0x20 length 0xC
        value 0000 205B 0000 0006 0000 000F 
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  24441 3491 3491 6939 46887 14607
    202.93.8.242 from 202.93.8.242 (202.93.8.242)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  20912 1267 1273 6939 46887 14607
    212.66.96.126 from 212.66.96.126 (212.66.96.126)
      Origin IGP, localpref 100, valid, external
      Community: 1273:23000 9035:50 9035:100 20912:65001
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  1221 4637 6939 46887 14607
    203.62.252.83 from 203.62.252.83 (203.62.252.83)
      Origin IGP, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  2497 6939 46887 14607
    202.232.0.2 from 202.232.0.2 (202.232.0.2)
      Origin IGP, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0


From: Tim Evens [mailto:tim () snas io] 
Sent: Friday, September 15, 2017 10:45 AM
To: Matthew Huff <mhuff () ox com>
Cc: morrowc.lists () gmail com; nanog () nanog org
Subject: Re: FW: Reliability of looking glass sites / rviews

You didn't mention details about which ASN or prefixes you were checking.  Are you referring to ASN 14607 that only 
advertises two prefixes 129.77.0.0/16 and 2620:0:2810::/48?

Based what we see over the weekend (using routeviews data), we see:

Event Start Time: 2017-09-09 11:29:23 UTC (2017-09-09 07:29:23 EDT)
Event End Time: 2017-09-09 13:31:30 UTC (2017-09-09 09:31:30 EDT)

Are the above times correct?

We see the routes withdraw and then come back.   For example: 
http://demo-rv.snas.io:3000/dashboard/db/prefix-history?orgId=2&var-prefix=129.77.0.0&var-prefix_len=16&var-asn_num=All&var-router_name=All&var-peer_name=All&from=1504908000000&to=1505203200000

When you checked routeviews, which router and peer were you looking at?  When you did a "show ip bgp ..." did you 
include the prefix length? If not, it would have then shown you 0/0 or 128/5, depending on which router you were on.


--Tim 





On 9/13/17, 8:43 AM, "NANOG on behalf of Matthew Huff" <nanog-bounces () nanog org on behalf of mhuff () ox com> wrote:

    Both should have been similar.
    
    In the first case we lost power to all of our BGP border routers that are peered with the upstream providers
    In the second case, I did an explicit “shut” on the interface connected to the upstream provider that appeared 
“stuck” after an hour after the outage.
    
    From: <christopher.morrow () gmail com> on behalf of Christopher Morrow <morrowc.lists () gmail com>
    Date: Wednesday, September 13, 2017 at 10:58 AM
    To: Matthew Huff <mhuff () ox com>
    Cc: nanog2 <nanog () nanog org>
    Subject: Re: Reliability of looking glass sites / rviews
    
    
    
    On Wed, Sep 13, 2017 at 5:30 AM, Matthew Huff <mhuff () ox com<mailto:mhuff () ox com>> wrote:
    This weekend our uninterruptible power supply became interruptible and we lost all circuits. While I was doing 
initial debugging of the problem while I waited on site power verification, I noticed that there was still paths being 
shown in rviews for the circuit that were down. This was over an hour after we went hard down and it took hours before 
we were back up.
    
    explicit vs implicit withdrawals causing different handling of the problem routes?
    
    I worked with our providers last night to verify there weren't any hanging static routes, etc... We shut the 
upstream circuit down and watched the convergence and saw that eventually all the paths disappeared. Given what we saw 
on Saturday, what would cause route-views to cache the paths that long?  Some looking glass sites only show what they 
are peered with or at most what their peers are peered with, that's why I've always used route-views.
    
    What looking glass sites other than route-views would people recommend?
    
    ripe ris.
    

 
 

Current thread: