nanog mailing list archives

Re: Validating multi-path in production?

From: Tom Beecher <beecher () beecher cc>
Date: Mon, 15 Nov 2021 09:07:46 -0500

It sounds like you want something like this:

https://github.com/facebookarchive/fbtracert

We have an internal tool that works on generally similar principles, works
pretty well.

( I have no relationship with Facebook; I just always remember their presos
on UDPinger and FBTracert from my first NANOG meeting for whatever reason.
:) )

On Sun, Nov 14, 2021 at 11:21 AM Adam Thompson <athompson () merlin mb ca>
wrote:

The problem I'm looking to solve is the logical opposite, I think: I want
to demonstrate that no links are malfunctioning in such a way that packets
on a certain path are getting silently dropped.  Which has some "proving a
negative" aspects to it, unfortunately.
I think the only way I can demonstrate it is to determine that every
single multi-path/hashed-member link is working, which is... hard.
Especially if I need to deal with the combinatoric explosion - I *think* I
can skip that part.
-Adam

Get Outlook for Android <https://aka.ms/AAb9ysg>
------------------------------
*From:* James Bensley <jwbensley+nanog () gmail com>
*Sent:* Sunday, November 14, 2021 5:29:25 AM
*To:* Adam Thompson <athompson () merlin mb ca>; nanog <nanog () nanog org>
*Subject:* Re: Validating multi-path in production?

On Fri, 12 Nov 2021 at 16:54, Adam Thompson <athompson () merlin mb ca>
wrote:

The best I've come up with so far is to have two test systems (typically
VMs) that use adjacent IP addresses and adjacent MAC addresses, and test
both inbound and outbound to/from those, blindly trusting/hoping that
hashing algorithms will *probably* exercise both paths.


If the goal is to test that traffic *is* being distributed across multiple
links based on traffic headers, then you can definable roll your own. I
think the problem is orchestrating it (feeding your topology data into the
tool, running the tool, getting the results out, and interpreting the
results etc).

A coupe of public examples:
https://github.com/facebookarchive/UdpPinger
https://www.youtube.com/watch?v=PN-4JKjCAT0

If you do roll your own, you need to taylor the tests to your topology and
your equipment. For example, you can have two VMs as you mentioned, each at
opposite ends of the network. Then, if your network uses a 5-tuple for ECMP
inside the core for example, you could send many flows between the two VMs,
rotating the sauce port for example, to ensure all links in a LAG or all
ECMP paths are used.

It's tricky to know the hashing algo for every type of device you have in
your network, and for each traffic type for each device type, if you have a
multi vendor network. Also, if your network carries a mix of IPv4, IPv6,
PPP, MPLS L3 VPNs, MPLS L2 VPNs, GRE, GTP, IPSEC, etc. The number of
permutations of tests you need to run and the result sets you need to
parse, grows very rapidly.

Cheers,
James.

Current thread:

Validating multi-path in production? Adam Thompson (Nov 12)
- Re: Validating multi-path in production? Jeff Tantsura (Nov 12)
  - Re: Validating multi-path in production? Mark Tinka (Nov 26)
- Re: Validating multi-path in production? Saku Ytti (Nov 13)
- Re: Validating multi-path in production? James Bensley (Nov 14)
  - Re: Validating multi-path in production? Adam Thompson (Nov 14)
    - Re: Validating multi-path in production? Martijn Schmidt via NANOG (Nov 14)
    - Re: Validating multi-path in production? Tom Beecher (Nov 15)