nanog mailing list archives

Re: Validating multi-path in production?


From: Adam Thompson <athompson () merlin mb ca>
Date: Sun, 14 Nov 2021 16:20:04 +0000

The problem I'm looking to solve is the logical opposite, I think: I want to demonstrate that no links are 
malfunctioning in such a way that packets on a certain path are getting silently dropped.  Which has some "proving a 
negative" aspects to it, unfortunately.
I think the only way I can demonstrate it is to determine that every single multi-path/hashed-member link is working, 
which is... hard.  Especially if I need to deal with the combinatoric explosion - I *think* I can skip that part.
-Adam

Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: James Bensley <jwbensley+nanog () gmail com>
Sent: Sunday, November 14, 2021 5:29:25 AM
To: Adam Thompson <athompson () merlin mb ca>; nanog <nanog () nanog org>
Subject: Re: Validating multi-path in production?

On Fri, 12 Nov 2021 at 16:54, Adam Thompson <athompson () merlin mb ca<mailto:athompson () merlin mb ca>> wrote:
The best I've come up with so far is to have two test systems (typically VMs) that use adjacent IP addresses and 
adjacent MAC addresses, and test both inbound and outbound to/from those, blindly trusting/hoping that hashing 
algorithms will probably exercise both paths.

If the goal is to test that traffic *is* being distributed across multiple links based on traffic headers, then you can 
definable roll your own. I think the problem is orchestrating it (feeding your topology data into the tool, running the 
tool, getting the results out, and interpreting the results etc).

A coupe of public examples:
https://github.com/facebookarchive/UdpPinger
https://www.youtube.com/watch?v=PN-4JKjCAT0

If you do roll your own, you need to taylor the tests to your topology and your equipment. For example, you can have 
two VMs as you mentioned, each at opposite ends of the network. Then, if your network uses a 5-tuple for ECMP inside 
the core for example, you could send many flows between the two VMs, rotating the sauce port for example, to ensure all 
links in a LAG or all ECMP paths are used.

It's tricky to know the hashing algo for every type of device you have in your network, and for each traffic type for 
each device type, if you have a multi vendor network. Also, if your network carries a mix of IPv4, IPv6, PPP, MPLS L3 
VPNs, MPLS L2 VPNs, GRE, GTP, IPSEC, etc. The number of permutations of tests you need to run and the result sets you 
need to parse, grows very rapidly.

Cheers,
James.

Current thread: