nanog mailing list archives

Re: BGP Experiment

From: Saku Ytti <saku () ytti fi>
Date: Fri, 25 Jan 2019 09:58:52 +0200

On Thu, 24 Jan 2019 at 18:43, <adamv0025 () netconsultings com> wrote:

We fight with that all the time,
I'd say that from the whole Design->Certify->Deploy->Verify->Monitor service lifecycle time budget, the service 
certification testing is almost half of it.
That's why I'm so interested in a model driven design and testing approach.


This shop has 100% automated blackbox testing, and still they have to
cherry-pick what to test. Do you have statistics how often you find
show-stopper issues and how far into the test they were found? I
expect this to be exponential curve, like upgrading box, getting your
signalling protocols up, pushing one packet in each service you sell
is easy and fast, I wonder will massive amount of work increase
confidence significantly from that. The issues I tend to find in
production are issues which are not trivial to recreate in lab, once
we know what they are, which implies that finding them a-priori is bit
naive expectation. So, assumptions:

a) blackbox testing has exponentially diminishing returns, quickly you
need to expand massively more efforts to gain slightly more confidence
b) you can never say 'x works' you can only say 'i found way to
confirm x is not broken in this very specific case', the way x will
end up being broken may be very complex
c) if recreating issues you know about is hard, then finding issues
you don't know about is massively more difficult
d) testing likely increases more your comfort to deploy than
probability of success

Hopefully we'll enter NOS future where we download NOS from github and
compile it to our devices. Allowing whole community to contribute to
unit testing and use-cases and to run minimal bug surface code in your
environment.
I see very little future in blackbox testing vendor NOS at operator
site, beyond quick poke at lab. Seems like poor value. Rather have
pessimistic deployment plan, lab => staging => 2-3 low risk site =>
2-3 high risk site => slow roll up

I really need to have this ever growing library of test cases that the automat will churn through with very little 
human intervention, in order to reduce the testing from months to days or weeks at least.


Lot of vendor, maybe all, accept your configuration and test them for
releases. I think this is only viable solution vendors have for
blackbox, gather configs from customers and test those, instead of try
to guess what to test.
I've done that with Cisco in two companies, unfortunately I can't
really tell if it impacted quality, but I like to think it did.





-- 
  ++ytti

Current thread:

Re: BGP Experiment, (continued)
- - - Re: BGP Experiment Christoffer Hansen (Jan 23)
    - Re: BGP Experiment Nikolas Geyer (Jan 23)
    - Re: BGP Experiment James Jun (Jan 23)
    - Re: BGP Experiment Christoffer Hansen (Jan 23)
    - Re: BGP Experiment Mark Tinka (Jan 23)
    - RE: BGP Experiment adamv0025 (Jan 24)
    - Re: BGP Experiment Brian Kantor (Jan 24)
    - RE: BGP Experiment adamv0025 (Jan 24)
    - Re: BGP Experiment Saku Ytti (Jan 24)
    - RE: BGP Experiment adamv0025 (Jan 24)
    - Re: BGP Experiment Saku Ytti (Jan 24)
    - RE: BGP Experiment adamv0025 (Jan 31)
    - Re: BGP Experiment Saku Ytti (Jan 31)
    - Re: BGP Experiment Randy Bush (Jan 31)
    - Re: BGP Experiment Christoffer Hansen (Jan 23)
    - Re: BGP Experiment Paul S. (Jan 23)
    - Re: BGP Experiment Töma Gavrichenkov (Jan 23)
    - Global statistics during the experiment (was Re: BGP Experiment) Mike Tancsa (Jan 24)
    - Re: Global statistics during the experiment (was Re: BGP Experiment) Töma Gavrichenkov (Jan 24)
    - Re: BGP Experiment Mike Hale (Jan 24)
    - Re: BGP Experiment valdis . kletnieks (Jan 24)

(Thread continues...)