nanog mailing list archives
Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey
From: Saku Ytti <saku () ytti fi>
Date: Fri, 9 Jul 2021 08:45:45 +0300
On Fri, 9 Jul 2021 at 00:01, William Herrin <bill () herrin us> wrote:
I would suggest that your customer does care, but as there is no
Most don't. Somewhat recently we were dropping a non-trivial amount of packets from a well-known book store due to DMAC failure. This was unexpected, considering it was an L3 to L3 connection. This was a LACP bundle with a large number of interfaces and this issue affected just one interface in the bundle. After we informed the customer about the problem, while it was still occurring, they could not observe it, they looked at their stats and whatever it was dropping was being drowned in the noise, it was not an actionable signal to them. Customer wasn't willing to remove the broken interface from the bundle, as they could not observe the problem. We did migrate that port to a working port and after 3 months we agreed with the vendor to stop troubleshooting it, vendor can see that they had misprogrammed their hardware, but they were not able to figure out why and therefore it is not fixed. Very large amount of cycles were spent at the vendor and operator, and a small amount of work (checking TCP resends etc) at customers trying to solve it. The reason we contacted the customer is because there were quite a large number of packets we were dropping, I can easily find 100 real smaller problems we have in the network immediately. Customer was /not/ wrong, the customer did the exact right thing. There are a lot of problems, and you can go deep into the rabbit hole trying to fix problems which are real but don't affect a sufficient amount of packets to have a meaningful impact on the product quality. -- ++ytti
Current thread:
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey, (continued)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Vanbever Laurent (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Baldur Norddahl (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Chriztoffer Hansen (Jul 09)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Vanbever Laurent (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Saku Ytti (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Vanbever Laurent (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Saku Ytti (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Saku Ytti (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Warren Kumari (Jul 09)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Yang Yu (Jul 09)