nanog mailing list archives

Re: Famous operational issues


From: Rogier van Eeten via NANOG <nanog () nanog org>
Date: Wed, 17 Feb 2021 15:57:38 +0100

Ahh, war stories. I like the one where I got a wake up call that our IRC server was on fire,  together with the rest of the DC.


Not that widespread, but we reached Slashdot. :)

November 2002, University of Twente, The Netherlands. Some idiot wanted to be a hero. He deflated peoples tires, to help inflate them. One morning he thought it would be a good idea to start a small fire and then extinguish it, so he would be the hero that stopped a fire. He failed and the building burned down. He got caught a few days later when he tried the same thing in a different building.

Almost all of the IT was in that building, including core network, uplinks to SURFNet (Dutch Educational Network) and to the 2000 students living on the campus. Ironically a new DC was already being built, so that was ready for use a few weeks later.

As we had quite a network for 2002 we hosted for instance security.debian.org. The students all had 100Mbit in their room, so some of them also hosted some popular websites. One I can remember was an image sharing site.

Some students immediately created a backup network; dhcp server, dns server with a catch all, website explaining what was going on, IRC server, etc..

A local ISP offered to sponsor 50Mbit for the residents, which was connected via a microwave relay and a temporary fiber was run through a ditch to connect two parts of the campus residencies. At the end of the day all 2000 students had their internet connection back, although all behind a single 50Mbit link.


Syslog message from the local SURFNet router:

lo0.ar5.enschede1.surf.net 3613: Nov 20 07:20:50.927 UTC: %ENV_MON-2-TEMP: Hotpoint temp sensor(slot 18) temperature has reached WARNING level at 61(C)


(Disclaimer: Where I say we, I mean we as University. I wasn't working for the university, but was part of the students working on the backup network. There are probably some other people on list with some more details and I've probably missed some details, but this is the summary.)


On 16-02-2021 23:08, Jared Mauch wrote:
I was thinking about how we need a war stories nanog track. My favorite was being on call when the router was stolen.

Sent from my TI-99/4a

On Feb 16, 2021, at 2:40 PM, John Kristoff <jtk () dataplane org> wrote:

Friends,

I'd like to start a thread about the most famous and widespread Internet
operational issues, outages or implementation incompatibilities you
have seen.

Which examples would make up your top three?

To get things started, I'd suggest the AS 7007 event is perhaps  the
most notorious and likely to top many lists including mine.  So if
that is one for you I'm asking for just two more.

I'm particularly interested in this as the first step in developing a
future NANOG session.  I'd be particularly interested in any issues
that also identify key individuals that might still be around and
interested in participating in a retrospective.  I already have someone
that is willing to talk about AS 7007, which shouldn't be hard to guess
who.

Thanks in advance for your suggestions,

John


Current thread: