nanog mailing list archives

Re: Theorical question about cyclic dependency in IRR filtering


From: Christopher Morrow <morrowc.lists () gmail com>
Date: Tue, 30 Nov 2021 12:05:13 -0500

On Tue, Nov 30, 2021 at 3:20 AM Ben Maddison <benm@workonline.africa> wrote:

Hi Chris,

On 11/29, Christopher Morrow wrote:
On Mon, Nov 29, 2021 at 8:14 AM Job Snijders via NANOG <nanog () nanog org>
wrote:

Hi Anurag,

Circular dependencies definitely are a thing to keep in mind when
designing IRR and RPKI pipelines!

In the case of IRR: It is quite rare to query the RIR IRR services
directly. Instead, the common practise is that utilities such as bgpq3,
peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for
example
whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These
IRRd instances serve as intermediate caches, and will continue to
serve old
cached data in case the origin is down. This phenomenon in the global
IRR
deployment avoids a lot of potential for circular dependencies.

Also, some organisations use threshold checks before deploying new
IRR-based filters to reduce risk of “misfiring”.


beyond just 'did the filter deployed change by +/- X%'
you probably don't want to deploy content if you can't actually talk to
the
source... which was anurag's proposed problem.

The point that Job was (I think?) trying to make was that by querying a
mirror for IRR data at filter generation time, as opposed to the source
DB directly, the issue that Anurag envisioned can be avoided.

I would recommend that anyone (esp. transit operators) using IRR data
for filter generation run a local mirror whose reachability is not
subject to IRR-based filters.


yup, sure; "remove external dependencies, move them  internal" :)
you can STILL end up with zero prefixes even in this case, of course.


Of course, disruption of the NRTM connection between the mirror and the
source DB can still result in local data becoming stale/incomplete.


yup!


You can imagine a situation where an NRTM update to an object covering
the source DB address space is missed during a connectivity outage, and
that missed change causes the outage to become persistent.
However, I think that is fairly contrived. I have certainly never seen
it in practise.


sure, there's a black-swan comment in here somewhere :)
The overall comment set here is really:
  "Plan for errors and graceful resumption of service in their existence"
  (and planning is hard)


Cheers,

Ben


Current thread: