nanog mailing list archives

How relable does the Internet need to be? (Was: Re: Converged Network Threat)


From: Steve Gibbard <scg () gibbard org>
Date: Wed, 25 Feb 2004 16:30:15 -0800 (PST)


Having woken up this morning and realized it was raining in my bedroom
(last night was the biggest storm the Bay Area has had since my house got
its new roof last summer), and then having moved from cleaning up that
mess to vacuuming water out of the basement after the city's storm sewer
overflowed (which seems to happen to everybody in my neighborhood a couple
of times a year), I've spent lots of time today thinking about general
expectations of reliability.  In the telecommunications industry, where we
tend to treat reliability as very important and any outage as a disaster,
hopefully the questions I've been coming up with aren't career ending. ;)
With that in mind, how much in the way of reliability problems is it
reasonable to expect our users to accept?

If the Internet is a utility, or more generally infrastructure our society
depends on, it seems there are a bunch of different systems to compare it
to.  In general, if I pick up my landline phone, I expect to get a
dialtone, and I expect to be able to make a call.  If somebody calls my
landline, I expect the phone to ring, and if I'm near the phone I expect
to be able to answer.  Yet, if I want somebody to actually get through to
me reliably, I'll probably give them my cell phone number instead.  If it
rings, I'm far more likely to able to answer it easily than I am my
landline, since the landline phone is in a fixed location.  Yet some
significant portion of calls to or from my cell phone come in when I'm in
areas with bad reception, and the conversation becomes barely
understandable.  In many cases, the signal is too weak to make a call at
all, and those who call me get sent straight to voicemail.  Most of us put
up with this, because we judge mobility to be more important than
reliability.

I don't think I've ever had a natural gas outage that I've noticed, but
most of my gas appliances won't work without electric power.  I seem to
lose electric power at home for a few hours once a year or so, and after
the interuption life tends to resume as it was before.  When power outages
were significantly more frequent, and due to rationing rather than to
accidents, it caused major political problems for the California
government.  There must be some threshold for what people are willing to
accept in terms of residential power outages, that's somewhere above 2-3
hours per year.

In Ann Arbor, Michigan, where I grew up, the whole town tended to pretty
much grind to a halt two or three days a year, when more snow fell than
the city had the resources to deal with.  That quantity of snow necessary
to cause that was probably four or five inches.  My understanding is that
Minneapolis and Washington DC both grind to a halt due to snow with
somewhat similar frequency, but the amount of snow requred is
significantly more in Minneapolis and significantly less in DC.  Again,
there must be some threshold of interruptions due to exceptionally bad
weather that are tolerated, which nobody wants to do worse than and nobody
wants to spend the money to do better than.

So, it appears that among general infrastructure we depend on, there are
probably the following reliability thresholds:

Employees not being able to get to work due to snow: two to three days per
year.
Berkeley storm sewers: overflow two to three days per year.
Residential Electricity: out two to three hours per year.
Cell phone service: Somewhat better than nine fives of reliability ;)
Landline phone service:  I haven't noticed an outage on my home lines in a
few years.
Natural gas: I've never noticed an outage.

How Internet service fits into that of course depends on how you're
accessing the Net.  The T-Mobile GPRS card I got recently seems
significantly less reliable than my cell phone.  My SBC DSL line is almost
to the reliability level of my landline phone or natural gas service,
except that the DSL router in my basement doesn't work when electric power
is out.  I'm probably poorly qualified to talk about the end-user
experience on the networks I actually work on, even if I had permission
to.  Like pretty much everybody else here, I'm always interested in doing
better on reliability.  And, like many of my neighbors, I'd like to be
able to store stuff on my basement floor.  In comparison to a lot of other
infrastructure we depend on, it seems to me the Internet is already doing
pretty well.

-Steve

On Wed, 25 Feb 2004, Jared Mauch wrote:


      Ok.

      I can't sit by here while people speculate about the possible
problems of a network outage.

      I think that most everyone here reading NANOG realizes that
the Internet is becoming more and more central to daily life even
for those that are not connected to the internet.

      From where i'm sitting, I see a number of potentially dangerous
trends that could result in some quite catastrophic failures of networks.
No, i'm not predicting that the internet will end in 8^H7 days or anything
like that.  I think the Level3 outage as seen from the outside is a clear
case that single providers will continue to have their own network failures
for time to come.  (I just hope daily it's not my employers network ;-) )

      So, We're sitting here at the crossroads, where VoIP is
"coming of age".  Vonage, 8x8 and others are blazing a path that
the rest of the providers are now beginning to gun for.  We've already
read in press releases and articles in the past year how providers
in Canada and the US are moving to VoIP transport within their long-distance
networks.

      I keep hear of Frame-Relay and ATM signaling that is going
to happen in large providers MPLS cores.  That's right, your "safe" TDM
based services, will be transported over someones IP backbone first.
This means if they don't protect their IP network, the TDM services could
fail.  These types of CES services are not just limited to Frame and ATM.
(Did anyone with frame/atm/vpn services from Level3 experience the
same outage?)

      Now the question of Emergency Services is being posed here but also
in parallel by a number of other people at the FCC.  We've seen the E911
recommendation come out regarding VoIP calls.  How long until a simple
power failure results in the inability to place calls?

      Now, i'm not trying to pick on Level3 at all.  The trend I
outline here is very real.  The reliance on the Internet for critical
communications is a trend that continues.  Look at how it was used
on 9/11 for communications when cell and land based telephony networks
were crippled.

      The internet has become a very critical part of all of our lives
(some more than others) with banks using VPNs to link their ATMs back into
their corporate network as well as the number of people that use it for
just plain "just in time" bill payment and other things.  I can literally
cancel my home phone line, cell phone and communicate soley with my
internet connection, performing all my bill payments without any paperwork.
I can even file my taxes online.

      We're at (or already past) the dangerous point of network
convergence.  While I suspect that nobody directly died as a result of
the recent outage, the trend to link together hospitals, doctors
and other agencies via the Internet and a series of VPN clients continues
to grow.  (I say this knowing how important the internet is to
the medical community, reading x-rays and other data scans at home for the
oncall is quite common).

      While my friends that are local VFD do still have the traditional
pager service with towers, etc... how long until the T1's that are
used for dial-in or speaking to the towers are moved to some sort of
IP based system?  The global economy seems to be going this direction with
varying degrees of caution.

      I'm concerned, but not worried.. the network will survive..

      - Jared


On Wed, Feb 25, 2004 at 09:17:30AM -0600, Pete Templin wrote:
If an IP-based system lets you see the status of the 23 hospitals in San
Antonio graphically, perhaps overlaid with near-real-time traffic
conditions, I'd rather use it as primary and telephone as secondary.

Counting on it?  No.  Gaining usability from it?  You betcha.

Brian Knoblauch wrote:

  If you're counting on IP (a "best attempt" protocol) for critical
data, you've got a serious design flaw in your system...

-----Original Message-----
From: owner-nanog () merit edu [mailto:owner-nanog () merit edu] On Behalf Of
Pete
Templin
Sent: Wednesday, February 25, 2004 9:10
To: Colin Neeson
Cc: nanog () merit edu
Subject: Re: Level 3 statement concerning 2/23 events (nothing to see, move
along)




Are you sure no one died as a result?  My hobby is volunteering as a
firefighter and EMT.  If Level3's network sits between a dispatch center
or mobile data terminal and a key resource, it could be a factor
(hospital status website, hazardous materials action guide, VoIP link
that didn't reroute because the control plane was happy but the
forwarding plane was sad, etc.).

And if the problem could happen to another network tomorrow but could be
prevented or patched, wouldn't inquiring minds want to know?  Your life
might be more interesting when the fit hits the shan if you have the
same vulnerability.

Colin Neeson wrote:


Because, in the the grand scale scheme of things, it's really not that
important.

No one died because of it, the normal, everyday events of the world
went
on,
unaffected by a Level 3 outage...

Might be nice to know what happened, but my life will certainly not be
less interesting by not having that knowledge...

--
Jared Mauch  | pgp key available via finger from jared () puck nether net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.


--------------------------------------------------------------------------------
Steve Gibbard                           scg () gibbard org
+1 415 717-7842 (cell)                  http://www.gibbard.org/~scg
+1 510 528-1035 (home)


Current thread: