nanog mailing list archives

Re: Reliable Cloud host ?


From: Owen DeLong <owen () delong com>
Date: Tue, 28 Feb 2012 11:46:28 -0800


On Feb 28, 2012, at 10:22 AM, William Herrin wrote:

On Tue, Feb 28, 2012 at 9:02 AM, Jared Mauch <jared () puck nether net> wrote:
On Feb 27, 2012, at 2:53 PM, Valdis.Kletnieks () vt edu wrote:
On Mon, 27 Feb 2012 14:02:04 EST, William Herrin said:

The net result is that when you switch the IP address of your server,
a percentage of your users (declining over time) will be unable to
access it for hours, days, weeks or even years regardless of the DNS
TTL setting.

Amen brother.

So just for grins, after seeing William's I set up a listener on an address
that had an NTP server on it many moons ago. As in the machine was shut down
around 2002/06/30 22:49 and we didn't re-assign the IP address ever since
*because* it kept getting hit with NTP packets..  Yes, a decade ago.

In the first 15 minutes, 234 different IP's have tried to NTP to that address.

I hereby reject the principle that one can not renumber a
host/name and move it.
I reject the idea that you can't move a service, or have one
MX, DNS, etc.. host be down and have it be fatal without
something else being SERIOUSLY broken.  If you are right,
nobody could ever renumber anything ever, nor take a
service down ever in the most absolute terms.

Something else IS seriously broken. Several something elses actually:

1. DNS TTL at the application boundary, due in part to...

DNS TTL shouldn't make it to the application boundary...

2. Pushing the name to layer 3 address mapping process up from layer 4
to layer 7 where each application has to (incorrectly) reinvent the
process, and...

But they don't have to... They can simply use getaddrinfo()/getnameinfo()
and let the OS libraries do it. The fact that some applications choose to
use their own resolvers instead of system libraries is what is broken.

3. A layer 4 protocol which overloads the layer 3 address as an
inseverable component of its transport identifier.

Even stuff like SMTP which took care to respect the DNS TTL in its own
standards gets busted at the back end: too many antispam process
components rely on the source IP address, crushing large scale servers
that suddenly appear, transmitting large amounts of email from a fresh
IP address.

I think this is orthogonal to DNS TTL issues.

Shockingly enough we have a strongly functional network despite this
brokenness. But, it's broken all the same and renumbering is majorly
impaired as a consequence.


In my experience, the biggest hurdle to renumbering has nothing to do with DNS,
DNS TTLs, respect or failure to respect them, etc.

In my experience the biggest renumbering challenges come from the number of configuration
files which contain your IP addresses yet are not under your control.
        VPNs (the configuration at the far side of the VPN)
        Firewalls (vendors, clients, etc. that have put your IP addresses into exceptions)
        Router configurations (vendors, clients, etc. that have special routing policy to reach you)
        etc.

These are the things that make renumbering hard. The DNS stuff is usually fairly trivial to work around with a little 
time and planning.


Renumbering in light of these issues isn't impossible. An overlap
period is required in which both old and new addresses are operable.

That's desirable even if you have a 5 second TTL and everyone did honor it.

The duration of that overlap period is not defined by the the protocol
itself. Rather, it varies with the tolerable level or residual
brokenness, literally how many nines of users should be operating on
the new address before the old address can go away.

There is some truth to that. The combination of applications having their
own (broken) resolver libraries and operating systems that provide even
more broken resolvers (thanks, Redmond) has made this a bigger challenge
than it should be. The ideal solution is to go back to using the OS resolver
libraries and fix them.

Best of luck actually achieving that.

Owen



Current thread: