nanog mailing list archives

RE: BGP Experiment


From: <adamv0025 () netconsultings com>
Date: Thu, 31 Jan 2019 09:16:55 -0000

From: Saku Ytti <saku () ytti fi>
Sent: Friday, January 25, 2019 7:59 AM

On Thu, 24 Jan 2019 at 18:43, <adamv0025 () netconsultings com> wrote:

We fight with that all the time,
I'd say that from the whole Design->Certify->Deploy->Verify->Monitor
service lifecycle time budget, the service certification testing is almost half of
it.
That's why I'm so interested in a model driven design and testing approach.

This shop has 100% automated blackbox testing, and still they have to cherry-
pick what to test. 

Sure one tests only for the few specific current and near future use cases.

Do you have statistics how often you find show-stopper
issues and how far into the test they were found? 

I don't keep those statistics, but running bug scrubs in order to determine the code for regression testing is usually 
good starting point to avoid show-stoppers, what is then found later on during the testing is usually patched -so yes 
you end up with a brand new code and several patches related to your use cases (PEs, Ps, etc..)
   
I expect this to be
exponential curve, like upgrading box, getting your signalling protocols up,
pushing one packet in each service you sell is easy and fast, I wonder will
massive amount of work increase confidence significantly from that. 

Yes it will.

The
issues I tend to find in production are issues which are not trivial to recreate
in lab, once we know what they are, which implies that finding them a-priori
is bit naive expectation. So, assumptions:

This is because you did your due diligence during the testing. 
Do you have statistics on the probability of these "complex" bugs occurrence?
    
Hopefully we'll enter NOS future where we download NOS from github and
compile it to our devices. Allowing whole community to contribute to unit
testing and use-cases and to run minimal bug surface code in your
environment.

Not there yet, but you can compile your own routing protocols and run those on vendor OS.

I see very little future in blackbox testing vendor NOS at operator site,
beyond quick poke at lab. Seems like poor value. Rather have pessimistic
deployment plan, lab => staging => 2-3 low risk site =>
2-3 high risk site => slow roll up

Yes that's also a possibility -one of the strong arguments for massive disaggregation at the edge, to reduce the 
fallout of a potential critical failure.
Depends on the shop really.

I really need to have this ever growing library of test cases that the automat
will churn through with very little human intervention, in order to reduce the
testing from months to days or weeks at least.

Lot of vendor, maybe all, accept your configuration and test them for
releases. I think this is only viable solution vendors have for blackbox, gather
configs from customers and test those, instead of try to guess what to test.
I've done that with Cisco in two companies, unfortunately I can't really tell if it
impacted quality, but I like to think it did.

Did that with juniper partners and now directly with Juniper. 
The thing is though they are using our test plan...

adam  


Current thread: