nanog mailing list archives

Re: MTU to CDN's


From: Ruairi Carroll <ruairi.carroll () gmail com>
Date: Fri, 19 Jan 2018 14:13:04 +0000

On 19 January 2018 at 13:48, Mike Hammett <nanog () ics-il net> wrote:

Other than people improperly blocking ICMP, when does PMTUD not work?
Honest question, not troll.


It can break under _certain_ scenarios with Anycast.

It can break under _certain_ scenarios in v6 with ECMP.

It can break across an LB in L4 mode, when a real behind the LB has an
unexpected MSS.

None of these scenarios are the normal, obviously, however PMTUD does have
some edge-cases.

/Ruairi






-----
Mike Hammett
Intelligent Computing Solutions
http://www.ics-il.com

Midwest-IX
http://www.midwest-ix.com

----- Original Message -----

From: "Mikael Abrahamsson" <swmike () swm pp se>
To: "Michael Crapse" <michael () wi-fiber io>
Cc: "NANOG list" <nanog () nanog org>
Sent: Friday, January 19, 2018 1:22:02 AM
Subject: Re: MTU to CDN's

On Thu, 18 Jan 2018, Michael Crapse wrote:

I don't mind letting the client premises routers break down 9000 byte
packets. My ISP controls end to end connectivity. 80% of people even let
our techs change settings on their computer, this would allow me to give
~5% increase in speeds, and less network congestion for end users for a
one
time $60 service many people would want. It's also where the internet
should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't
the
entire internet just moved to 9000(or 9600 L2) byte MTU? It was created
for
the jump to gigabit... That's 4 orders of magnitude ago. The internet
backbone shouldn't be shuffling around 1500byte packets at 1tbps. That
means if you want to layer 3 that data, you need a router capable of more
than half a billion packets/s forwarding capacity. On the other hand,
with
even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and
forwarding capacity needs just 100 or so mpps capacity. Routers that
forward at that rate are found for less than $2k.

As usual, there are 5-10 (or more) factors playing into this. Some, in
random order:

1. IEEE hasn't standardised > 1500 byte ethernet packets
2. DSL/WIFI chips typically don't support > ~2300 because reasons.
3. Because 2, most SoC ethernet chips don't either
4. There is no standardised way to understand/probe the L2 MTU to your
next hop (ARP/ND and probing if the value actually works)
5. PMTUD doesn't always work.
6. PLPMTUD hasn't been implemented neither in protocols nor hosts
generally.
7. Some implementations have been optimized to work on packets < 2000
bytes and actually has less performance than if they have to support
larger packets (they will allocate 2k buffer memory per packet), 9k is
ill-fitting across 2^X values
8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's
going to be mixed-MTU unless you control all devices (which is typically
not the case outside of the datacenter).
9. The PPS problem in hosts and routers was solved by hardware offloading
to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS
no longer was a big problem.

On the value to choose for "large MTU", 9000 for edge and 9180 for core is
what I advocate, after non-trivial amount of looking into this. All major
core routing platforms work with 9180 (with JunOS only supporting this
after 2015 or something). So if we'd want to standardise on MTU that all
devices should support, then it's 9180, but we'd typically use 9000 in RA
to send to devices.

If we want a higher MTU to be deployable across the Internet, we need to
make it incrementally deployable. Some key things to achieve that:

1. Get something like
https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented.
2. Go to the IETF and get a document published that advises all protocols
to support PLMTUD (RFC4821)

1 to enable mixed-MTU lans.
2 to enable large MTU hosts to actually be able to communicate when PMTUD
doesn't work.

With this in place (wait ~10 years), larger MTU is now incrementally
deployable which means it'll be deployable on the Internet, and IEEE might
actually accept to standardise > 1500 byte packets for ethernet.

--
Mikael Abrahamsson email: swmike () swm pp se




Current thread: