nanog mailing list archives

Re: MCI WorldCom fiber cut in White Plains, NY


From: Sean Donelan <sean () donelan com>
Date: 10 Oct 1999 21:35:22 -0700


i guess fibre is just not being laid in rings any more?  why on earth > should a single backhoe cut be able to take 
ANY circuit out?

I've wondered about this for a few years.  The only answer I've
been able to come up with is "implementation."  Its the same issue
anytime the real world and theory collide.

Of course, the salespeople give the most optimistic prediction
for the performance of their product.  But engineers aren't always
much better.  There are a lot of assumptions, guesstimates, and
business trade-offs along the way to the final product.

The biggest assumption is the ring is in fact a "ring," and not
a "string" wrapped back on itself.  Eventually even the accounts
catch on to the fact you can sell almost double the fiber miles by
selling both halves of the ring.  If you see Nx1 protect, run.

Another big assumption is the fiber equipment along the ring itself
will work.  There are a lot of active elements along the ring.  The
assumption only one thing will fail at a time (especially when humans
are involved) isn't a safe assumption.  Your protect circuit may have
failed a long time ago, but your transport layer won't find out until
it needs it.  Your ring isn't interrupted just twice, at PAIX and
AMES.  But most likely passes through many active and passive points
around the ring, most of which you won't know about, until they fail.

If you have enough clout, walk the circuit with the carrier from
demarc to demarc.  Most will give you a song and dance about security
concerns, but it helps to have the people who hand out security
clearances out your side :-)  I did once, and what was amazing wasn't that circuit was screwed up, but that the circuit 
worked at all.
Not that it will really help in the long term, because the carrier
will just re-groom the circuit a few weeks or months later.

Lots of people have suggestions such as asking the carrier for
actual route maps, design layout records, and even penalities.  But
I haven't found anything that really works.  And its not just my
incompetance, other people have had problems too.  Both NORAD and
the FAA had triple-redundant circuits cut.  Southwestern Bell had
three different fiber cuts in the past six months take out service
for three to eighteen counties, including 9-1-1 and SS7 links.

And even if you have done everything right, there is still a final
dirty word "Pre-emption."  If you kept your network up, and someone
with a higher priority circuit didn't, the carrier can take your
working circuit to restore the higher priority circuit.

If there is a technical solution, I would love to hear it.  But
it seems to involve more than just technology.





Current thread: