nanog mailing list archives

Re: F.ROOT-SERVERS.NET moved to Beijing?


From: Danny McPherson <danny () tcb net>
Date: Mon, 3 Oct 2011 12:38:25 -0400


On Oct 3, 2011, at 11:20 AM, Leo Bicknell wrote:

Thus the impact to valid names should be minimal, even in the face
of longer timeouts.

If you're performing validation on a recursive name server (or 
similar resolution process) expecting a signed response yet the 
response you receive is either unsigned or doesn't validate 
(i.e., bogus) you have to:

1) ask other authorities?  how many?  how frequently?  impact?
2) consider implications on _entire chain of trust?
3) tell the client something?  
4) cache what (e.g., zone cut from who you asked)? how long? 
5) other?

"minimal" is not what I was thinking..

Network layer integrity and secure routing don't help the majority of
end users.  At my house I can choose Comcast or AT&T service.  They will
not run BGP with me, I could not apply RPKI, secure BGP, or any other
method to the connections.  They may well do NXDOMAIN remapping on their
resolvers, or even try and transparently rewrite DNS answers.  Indeed
some ISP's have even experimented with injecting data into port 80
traffic transparently!

Secure networks only help if the users have a choice, and choose to not
use "bad" networks.  If you want to be able to connect at Starbucks, or
the airport, or even the conference room Wifi on a clients site you need
to assume it's a rogue network in the middle.

The only way for a user to know what they are getting is end to end
crypto.  Period.

I'm not sure how "end to end" crypto helps end users in the advent
of connectivity and *availability* issues resulting from routing 
brokenness in an upstream network which they do not control. 
"crypto", OTOH, depending on what it is and where in the stack it's 
applied, might well align with my "network layer integrity" 
assertion.

As for the speed of detection, its either instantenous (DNSSEC
validation fails), or it doesn't matter how long it is (minutes,
hours, days).  The real problem is the time to resolve.  It doesn't
matter if we can detect in seconds or minutes when it may take hours
to get the right people on the phone and resolve it.  Consider this
weekend's activity; it happened on a weekend for both an operator
based in the US and a provider based in China, so you're dealing
with weekend staff and a 12 hour time difference.

If you want to insure accuracy of data, you need DNSSEC, period.
If you want to insure low latency access to the root, you need
multiple Anycasted instances because at any one point in time a
particular one may be "bad" (node near you down for maintenance,
routing issue, who knows) which is part of why there are 13 root
servers.  Those two things together can make for resilliance,
security and high performance.

You miss the point here Leo.  If the operator of a network service 
can't detect issues *when they occur* in the current system in some 
automated manner, whether unintentional or malicious, they won't be 
alerted, they certainly can't "fix" the problem, and the potential 
exposure window can be significant.

Ideally, the trigger for the alert and detection function is more 
mechanized than "notification by services consumer", and the network 
service operators or other network operators aware of the issue have 
some ability to institute reactive controls to surgically deal with 
that particular issue, rather than being captive to the [s]lowest 
common denominator of all involved parties, and dealing with 
additional non-determinsitic failures or exposure in the interim.

Back to my earlier point, for *resilience* network layer integrity 
techniques and secure routing infrastructure are the only preventative 
controls here, and necessarily to augment DNSSEC's authentication and 
integrity functions at the application layer.  Absent these, rapid 
detection enabling reactive controls that mitigate the issue are 
necessary.

-danny


Current thread: