nanog mailing list archives
Re: ultradns reachability
From: Joe Abley <jabley () isc org>
Date: Fri, 2 Jul 2004 10:22:09 -0400
On 2 Jul 2004, at 00:18, Christopher L. Morrow wrote:
So, I thought of it like this:1) Rodney/Centergate/UltraDNS knows where all their 35000billion copies ofthe 2 .org TLD boxes are, what network pieces they are connected to at which bandwidths and the current utilization 2) Rodney/Centergate/UltraDNS knows which boxes in each location (therecould be multiple inside each pod, right?) are running their dns processand answering at which rates 3) Rodney/Centergate/UltraDNS knows when processes die and locally stop pushing requests to said system inside the pod 4) Rodney/Centergate/UltraDNS knows when a pod is completely down (nosystmes responding inside the local pod) so they can stop routing the /24from that pod's locationSo, Rodney/Centergate/UltraDNS should know almost exactly when they have aproblem they can term 'critical'... I most probably left out some steps above, like wedged proceseses or loss of outbound routing to prefixessending reqeusts. I'm sure Paul/ISC has a fairly complete list of failuremodes for anycast DNS services.
All the failure modes that ISC has seen with anycast nameserver instances can be avoided (for the authoritative DNS service as a whole) by including one or more non-anycast nameservers in the NS set.
This leaves the anycast servers providing all the optimisation that they are good for (local nameserver in toplogically distant networks; distributed DDoS traffic sink; reduced transaction RTT) and provides a fall-back in case of effective reachability problems for the anycast nameservers.
This is so trivial, I continue to be amazed that PIR hasn't done it.
The problem then becomes the "Hey, .org is dead!" From where is it dead? What pod are you seeing it dead from? Is it routing TO the pod from you?FROM the pod to you? The pod itself? Stuck/stale routing information somewhere on the path(s)? This is very complex, or seems to be to me :(
With the fix above, the problem becomes "hey, *some* of the nameservers for ORG are dead! We should fix that, but since not *all* of them are dead, at least ORG still works."
I think more failure modes will be investigated before that comes :) fortunately lots of people are already investigating these, eh?
I don't know about lots, but I know of a few. None of the people I know of are using an entire production TLD as their test-bed, however.
Joe
Current thread:
- ultradns reachability Matt Ghali (Jul 01)
- Re: ultradns reachability Christopher X. Candreva (Jul 01)
- Re: ultradns reachability Chris Adams (Jul 01)
- Re: ultradns reachability Eric Frazier (Jul 01)
- Re: ultradns reachability James Edwards (Jul 01)
- Re: ultradns reachability Christopher L. Morrow (Jul 01)
- Re: ultradns reachability k claffy (Jul 01)
- Re: ultradns reachability Christopher L. Morrow (Jul 01)
- Re: ultradns reachability Edward B. Dreger (Jul 01)
- Re: ultradns reachability Joe Abley (Jul 02)
- Re: ultradns reachability Leo Bicknell (Jul 02)
- Re: ultradns reachability Joe Abley (Jul 02)
- Re: ultradns reachability Leo Bicknell (Jul 02)
- Re: ultradns reachability James Edwards (Jul 01)
- Re: ultradns reachability Dr. Jeffrey Race (Jul 02)
- Re: ultradns reachability Stephen J. Wilcox (Jul 02)
- Re: ultradns reachability Bill Woodcock (Jul 03)
- <Possible follow-ups>
- RE: ultradns reachability Cody Lerum (Jul 01)
- Re: ultradns reachability Matt Ghali (Jul 02)
- Re: ultradns reachability Leo Bicknell (Jul 03)