nanog mailing list archives

Re: Your opinion on network analysis in the presence of uncertain events


From: "Vanbever Laurent" <lvanbever () ethz ch>
Date: Thu, 17 Jan 2019 08:06:54 +0000

Hi Adam/Mel,

Thanks for chiming in!

My understanding was that the tool will combine historic data with the MTBF datapoints form all components involved in 
a given link in order to try and estimate a likelihood of a link failure.

Yep. This could be one way indeed. This likelihood could also be taking the form of intervals in which you expect the 
true value to lies (again, based on historical data). This could be done both for link/devices failures but also for 
external inputs such as BGP announcements (to consider the likelihood that you receive a route for X in, say, NEWY). 
The tool would then to run the deterministic routing protocols (not accounting for ‘features’ such as 
prefer-oldest-route for a sec.) on these probabilistic inputs so as to infer the different possible forwarding outcomes 
and their relative probabilities. For now we had something like this in mind.

One can of course make the model more and more complex by e.g. also taking into account data plane status (to model 
gray failures). Intuitively though, the more complex the model, the more complex the inference process is.

Heck I imagine if one would stream a heap load of data at a ML algorithm it might draw some very interesting 
conclusions indeed -i.e. draw unforeseen patterns across huge datasets while trying to understand the overall system 
(network) behaviour. Such a tool might teach us something new about our networks.
Next level would be recommendations on how to best address some of the potential pitfalls it found.

Yes. I believe some variants of this exist already. I’m not sure how much they are used in practice though. AFAICT, 
false positives/negatives is still a big problem. Non-trivial recommendation system will require a model of the network 
behavior that can somehow be inverted easily which is probably something academics should spend some time on :-)

Maybe in closed systems like IP networks, with use of streaming telemetry from SFPs/NPUs/LC-CPUs/Protocols/etc.., we’ll 
be able to feed the analytics tool with enough data to allow it to make fairly accurate predictions (i.e. unlike in 
weather or markets prediction tools where the datasets (or search space -as not all attributes are equally relevant) is 
virtually endless).

I’m with you. I also believe that better (even programmable) telemetry will unlock powerful analysis tools.

Best,
Laurent


PS: Thanks a lot to those who have already answered our survey! For those who haven’t yet: 
https://goo.gl/forms/HdYNp3DkKkeEcexs2 (it only takes a couple of minutes).

Current thread: