nanog mailing list archives

Re: plea for comcast/sprint handoff debug help


From: Tim Bruijnzeels <tim () nlnetlabs nl>
Date: Mon, 2 Nov 2020 09:13:16 +0100

Hi Randy, all,

On 31 Oct 2020, at 04:55, Randy Bush <randy () psg com> wrote:

If there is a covering less specific ROA issued by a parent, this will
then result in RPKI invalid routes.

i.e. the upstream kills the customer.  not a wise business model.

I did not say it was. But this is the problematic case.

For the vast majority of ROAs the sustained loss of the repository would lead to invalid ROA *objects*, which will not 
be used in Route Origin Validation anymore leading to the state 'Not Found' for the associated announcements.

This is not the case if there are other ROAs for the same prefixes published by others (most likely the parent). Quick 
back of the envelope analysis: this affects about 0.05% of ROA prefixes.

The fall-back may help in cases where there is an accidental outage of
the RRDP server (for as long as the rsync servers can deal with the
load)

folk try different software, try different configurations, realize that
having their CA gooey exposed because they wanted to serve rrdp and
block, ...

We are talking here about the HTTPS server being unavailable, while rsync *is*.

So this means, your HTTPS server is down, unreachable, or has an issue with its HTTPS certificate. Your repository 
could use a CDN if they don't want to do all this themselves. They could monitor, and fix things.. there is time.

Thing is even if HTTPs becomes unavailable this still leaves hours (8 by default for the Krill CA, configurable) to fix 
things. Routinator (and the RIPE NCC Validator, and others) will use cached data if they cannot retrieve new data. It's 
only when manifests and CRLs start to expire that the objects would become invalid.

So the fallback helps in case of incidents with HTTPS that were not fixed within 8 hours for 0.05% of prefixes.

On the other hand, the fallback exposes a Malicious-in-the-Middle replay attack surface for 100% of the prefixes 
published using RRDP, 100% of the time. This allows attackers to prevent changes in ROAs to be seen.

This is a tradeoff. I think that protecting against replay should be considered more important here, given the numbers 
and time to fix HTTPS issue.


randy, finding the fort rp to be pretty solid!

Unrelated, but sure I like Fort too.

Tim

Current thread: