nanog mailing list archives

Re: Reliable Cloud host ?


From: William Herrin <bill () herrin us>
Date: Mon, 27 Feb 2012 10:28:37 -0500

On Sun, Feb 26, 2012 at 7:02 PM, Randy Carpenter <rcarpen () network1 net> wrote:
On Feb 26, 2012, at 4:56 PM, Randy Carpenter wrote:
1. Full redundancy with instant failover to other hypervisor hosts
upon hardware failure (I thought this was a given!)

This is actually a much harder problem to solve than it sounds, and
gets progressively harder depending on what you mean by "failover".

At the very least, having two physical hosts capable of running your
VM requires that your VM be stored on some kind of SAN (usually
iSCSI based) storage system. Otherwise, two hosts have no way of
accessing your VM's data if one were to die. This makes things an
order of magnitude or higher more expensive.

This does not have to be true at all.  Even having a fully fault-tolerant
SAN in addition to spare servers should not cost much more than
having separate RAID arrays inside each of the server, when you
are talking about 1,000s of server (which Rackspace certainly has)

Randy,

You're kidding, right?

SAN storage costs the better part of an order of magnitude more than
server storage, which itself is several times more expensive than
workstation storage. That's before you duplicate the SAN and set up
the replication process so that cabinet and room level failures don't
take you out.

DR sites then create a ferocious (read: expensive) bandwidth
challenge. Data can't flush from the primary SAN's write cache until
the DR SAN acknowledges receipt. If you don't have enough bandwidth to
keep up under the heaviest daily loads, the cache quickly fills and
the writes block.


I maintain 50ish VMs with about 30 different providers at the moment.
Not one of them attempts to do anything like what you describe.


NetApp. HA heads. Done. Add a DR site with replication,
and you can survive a site failure, and be back up and
running in less than an hour. I would think that the big
datacenter guys already have this type of thing set up.

That's expensive and VMs are sold primarily on price. You want high
reliability, you start with the dedicated colo server. Customers who
want DR in a VM environment buy two VMs and build data replication at
the app layer.


On Mon, Feb 27, 2012 at 9:31 AM, Max <perldork () webwizarddesign com> wrote:
Linode.com is not cloud based but they offer IP failover between VPS
instances at no additonal charge - their pricing is excellent, I have
had no down time issues with them in 3+ years with 3 different
customers using them and they have nice OOB and programmatic API
access for controlling VPs instances as well.

Hi Max,

I have had superb results from Linode and highly recommend them.
However, they're facilitating application level failover not keeping
your VM magically alive. And:

http://library.linode.com/linux-ha/ip-failover-heartbeat-pacemaker-ubuntu-10.04

"Both Linodes must reside in the same datacenter for IP failover"

So they don't support a full DR capability even if you're smart at the
app level.


On Mon, Feb 27, 2012 at 9:39 AM, Jared Mauch <jared () puck nether net> wrote:
Is the DNS service authoritative or recursive?  If auth, you can
solve this a few ways, either by giving the DNS name people
point to multiple AAAA (and A) records pointing at a diverse
set of instances.  DNS is designed to work around a host
being down.  Same goes for MX and several other services.
 While it may make the service slightly slower, it's certainly
not the end of the world.

Hi Jared,

How DNS is designed to work and how it actually works is not the same.
Look up "DNS Pinning" for example. For most kinds of DR you need IP
level failover where the IP address is rerouted to the available site.

Regards,
Bill Herrin


-- 
William D. Herrin ................ herrin () dirtside com  bill () herrin us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004


Current thread: