nanog mailing list archives
Re: Global Akamai Outage
From: Jared Mauch <jared () puck nether net>
Date: Sun, 25 Jul 2021 11:12:43 -0400
Work hat is not on, but context is included from prior workplaces etc.
On Jul 25, 2021, at 2:22 AM, Saku Ytti <saku () ytti fi> wrote: It doesn't seem like a tenable solution, when the solution is 'do better', since I'm sure whoever did those checks did their best in the first place. So we must assume we have some fundamental limits what 'do better' can achieve, we have to assume we have similar level of outage potential in all work we've produced and continue to produce for which we exert very little control over.
I have seen a very strong culture around risk and risk avoidance whenever possible at akamai. Some minor changes are taken very seriously. I appreciate that on a daily basis, and when we make mistakes (I am human after all) are made, reviews of the mistakes and corrective steps are planned and followed up on. I'm sure this time will not be different. I also get how easy it is to be cynical about these issues. There's always someone with power who can break things, but those can also often fix them just as fast. Focus on how you can do a transactional routing change and roll it back, how you can test etc. This is why for years I told one vendor that had a line-by-line parser their system was too unsafe for operation. There's also other questions like: How can we improve response times when things are routed poorly? Time to mitigate hijacks is improved my majority of providers doing RPKI OV, but interprovider response time scales are much longer. I also think about the two big CTL long haul and routing issues last year. How can you mitigate these externalities. - Jared
Current thread:
- Re: Global Akamai Outage, (continued)
- Re: Global Akamai Outage Mark Tinka (Jul 22)
- Re: Global Akamai Outage Andy Ringsmuth (Jul 22)
- Re: Global Akamai Outage Jared Mauch (Jul 22)
- Re: Global Akamai Outage Grant Taylor via NANOG (Jul 22)
- Re: Global Akamai Outage Andy Ringsmuth (Jul 22)
- Re: Global Akamai Outage Hank Nussbacher (Jul 22)
- Re: Global Akamai Outage Hank Nussbacher (Jul 24)
- Re: Global Akamai Outage Saku Ytti (Jul 24)
- Re: Global Akamai Outage Hank Nussbacher (Jul 25)
- Re: Global Akamai Outage Mark Tinka (Jul 25)
- Re: Global Akamai Outage Jared Mauch (Jul 25)
- Re: Global Akamai Outage Saku Ytti (Jul 25)
- Re: Global Akamai Outage Mark Tinka (Jul 25)
- Re: Global Akamai Outage Saku Ytti (Jul 25)
- Re: Global Akamai Outage Mark Tinka (Jul 26)
- Re: Global Akamai Outage Lukas Tribus (Jul 26)
- Re: Global Akamai Outage Mark Tinka (Jul 26)
- Re: Global Akamai Outage heasley (Jul 26)
- Re: Global Akamai Outage Mark Tinka (Jul 26)
- Re: Global Akamai Outage Lukas Tribus (Jul 26)
- Re: Global Akamai Outage Mark Tinka (Jul 27)
- Re: Global Akamai Outage Hank Nussbacher (Jul 24)