nanog mailing list archives
Re: Cloudflare is down
From: Leo Bicknell <bicknell () ufp org>
Date: Mon, 4 Mar 2013 06:51:31 -0800
In a message written on Mon, Mar 04, 2013 at 09:31:13AM +0200, Saku Ytti wrote:
Probably only thing you could have done to plan against this, would have been to have solid dual-vendor strategy, to presume that sooner or later, software defect will take one vendor completely out. And maybe they did plan for it, but decided dual-vendor costs more than the rare outages.
From what I have heard so far there is something else they could have done, hire higher quality people. Any competent network admin would have stopped and questioned a 90,000+ byte packet and done more investigation. Competent programmers writing their internal tools would have flagged that data as out of rage. I can't tell you how many times I've sat in a post mortem meeting about some issue and the answer from senior management is "why don't you just provide a script to our NOC guys, so the next time they can run it and make it all better." Of course it's easy to say that, the smart people have diagnosed the problem! You can buy these "scripts" for almost any profession. There are manuals on how to fix everything on a car, and treatment plans for almost every disease. Yet most people intuitively understand you take your car to a mechanic and your body to a doctor for the proper diagnosis. The primary thing you're paying for is expertise in what to fix, not how to fix it. That takes experience and training. But somehow it doesn't sink in with networking. I would not at all be surprised to hear that someone over at Cloudflare right now is saying "let's make a script to check the packet size" as if that will fix the problem. It won't. Next time the issue will be different, and the same undertrained person who missed the packet size this time will miss the next issue as well. They should all be sitting around saying, "how can we hire compentent network admins for our NOC", but that would cost real money. -- Leo Bicknell - bicknell () ufp org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Attachment:
_bin
Description:
Current thread:
- Cloudflare is down Arthur Wist (Mar 03)
- Re: Cloudflare is down Jay Ashworth (Mar 03)
- Re: Cloudflare is down Nick Hilliard (Mar 03)
- Re: Cloudflare is down Constantine A. Murenin (Mar 03)
- Re: Cloudflare is down Florian Weimer (Mar 03)
- Re: Cloudflare is down Saku Ytti (Mar 03)
- Re: Cloudflare is down Leo Bicknell (Mar 04)
- Re: Cloudflare is down Saku Ytti (Mar 04)
- Re: Cloudflare is down Patrick W. Gilmore (Mar 04)
- Re: Cloudflare is down Warren Bailey (Mar 04)
- Re: Cloudflare is down Jeff Wheeler (Mar 04)
- Re: Cloudflare is down Saku Ytti (Mar 04)
- Re: Cloudflare is down Valdis . Kletnieks (Mar 04)
- Re: Cloudflare is down George Herbert (Mar 04)
- RE: Cloudflare is down Adam Vitkovsky (Mar 05)
- Re: Cloudflare is down Constantine A. Murenin (Mar 03)
- Re: Cloudflare is down Christopher Morrow (Mar 04)
- Re: Cloudflare is down danny (Mar 06)