nanog mailing list archives
Re: BGP Experiment
From: Saku Ytti <saku () ytti fi>
Date: Fri, 25 Jan 2019 09:58:52 +0200
On Thu, 24 Jan 2019 at 18:43, <adamv0025 () netconsultings com> wrote:
We fight with that all the time, I'd say that from the whole Design->Certify->Deploy->Verify->Monitor service lifecycle time budget, the service certification testing is almost half of it. That's why I'm so interested in a model driven design and testing approach.
This shop has 100% automated blackbox testing, and still they have to cherry-pick what to test. Do you have statistics how often you find show-stopper issues and how far into the test they were found? I expect this to be exponential curve, like upgrading box, getting your signalling protocols up, pushing one packet in each service you sell is easy and fast, I wonder will massive amount of work increase confidence significantly from that. The issues I tend to find in production are issues which are not trivial to recreate in lab, once we know what they are, which implies that finding them a-priori is bit naive expectation. So, assumptions: a) blackbox testing has exponentially diminishing returns, quickly you need to expand massively more efforts to gain slightly more confidence b) you can never say 'x works' you can only say 'i found way to confirm x is not broken in this very specific case', the way x will end up being broken may be very complex c) if recreating issues you know about is hard, then finding issues you don't know about is massively more difficult d) testing likely increases more your comfort to deploy than probability of success Hopefully we'll enter NOS future where we download NOS from github and compile it to our devices. Allowing whole community to contribute to unit testing and use-cases and to run minimal bug surface code in your environment. I see very little future in blackbox testing vendor NOS at operator site, beyond quick poke at lab. Seems like poor value. Rather have pessimistic deployment plan, lab => staging => 2-3 low risk site => 2-3 high risk site => slow roll up
I really need to have this ever growing library of test cases that the automat will churn through with very little human intervention, in order to reduce the testing from months to days or weeks at least.
Lot of vendor, maybe all, accept your configuration and test them for releases. I think this is only viable solution vendors have for blackbox, gather configs from customers and test those, instead of try to guess what to test. I've done that with Cisco in two companies, unfortunately I can't really tell if it impacted quality, but I like to think it did. -- ++ytti
Current thread:
- Re: BGP Experiment, (continued)
- Re: BGP Experiment Christoffer Hansen (Jan 23)
- Re: BGP Experiment Nikolas Geyer (Jan 23)
- Re: BGP Experiment James Jun (Jan 23)
- Re: BGP Experiment Christoffer Hansen (Jan 23)
- Re: BGP Experiment Mark Tinka (Jan 23)
- RE: BGP Experiment adamv0025 (Jan 24)
- Re: BGP Experiment Brian Kantor (Jan 24)
- RE: BGP Experiment adamv0025 (Jan 24)
- Re: BGP Experiment Saku Ytti (Jan 24)
- RE: BGP Experiment adamv0025 (Jan 24)
- Re: BGP Experiment Saku Ytti (Jan 24)
- RE: BGP Experiment adamv0025 (Jan 31)
- Re: BGP Experiment Saku Ytti (Jan 31)
- Re: BGP Experiment Randy Bush (Jan 31)
- Re: BGP Experiment Christoffer Hansen (Jan 23)
- Re: BGP Experiment Paul S. (Jan 23)
- Re: BGP Experiment Töma Gavrichenkov (Jan 23)
- Global statistics during the experiment (was Re: BGP Experiment) Mike Tancsa (Jan 24)
- Re: Global statistics during the experiment (was Re: BGP Experiment) Töma Gavrichenkov (Jan 24)
- Re: BGP Experiment Mike Hale (Jan 24)
- Re: BGP Experiment valdis . kletnieks (Jan 24)