nanog mailing list archives

Re: MTU to CDN's


From: Mike Hammett <nanog () ics-il net>
Date: Fri, 19 Jan 2018 07:48:07 -0600 (CST)

Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll. 




----- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

----- Original Message -----

From: "Mikael Abrahamsson" <swmike () swm pp se> 
To: "Michael Crapse" <michael () wi-fiber io> 
Cc: "NANOG list" <nanog () nanog org> 
Sent: Friday, January 19, 2018 1:22:02 AM 
Subject: Re: MTU to CDN's 

On Thu, 18 Jan 2018, Michael Crapse wrote: 

I don't mind letting the client premises routers break down 9000 byte 
packets. My ISP controls end to end connectivity. 80% of people even let 
our techs change settings on their computer, this would allow me to give 
~5% increase in speeds, and less network congestion for end users for a one 
time $60 service many people would want. It's also where the internet 
should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the 
entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for 
the jump to gigabit... That's 4 orders of magnitude ago. The internet 
backbone shouldn't be shuffling around 1500byte packets at 1tbps. That 
means if you want to layer 3 that data, you need a router capable of more 
than half a billion packets/s forwarding capacity. On the other hand, with 
even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and 
forwarding capacity needs just 100 or so mpps capacity. Routers that 
forward at that rate are found for less than $2k. 

As usual, there are 5-10 (or more) factors playing into this. Some, in 
random order: 

1. IEEE hasn't standardised > 1500 byte ethernet packets 
2. DSL/WIFI chips typically don't support > ~2300 because reasons. 
3. Because 2, most SoC ethernet chips don't either 
4. There is no standardised way to understand/probe the L2 MTU to your 
next hop (ARP/ND and probing if the value actually works) 
5. PMTUD doesn't always work. 
6. PLPMTUD hasn't been implemented neither in protocols nor hosts 
generally. 
7. Some implementations have been optimized to work on packets < 2000 
bytes and actually has less performance than if they have to support 
larger packets (they will allocate 2k buffer memory per packet), 9k is 
ill-fitting across 2^X values 
8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's 
going to be mixed-MTU unless you control all devices (which is typically 
not the case outside of the datacenter). 
9. The PPS problem in hosts and routers was solved by hardware offloading 
to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS 
no longer was a big problem. 

On the value to choose for "large MTU", 9000 for edge and 9180 for core is 
what I advocate, after non-trivial amount of looking into this. All major 
core routing platforms work with 9180 (with JunOS only supporting this 
after 2015 or something). So if we'd want to standardise on MTU that all 
devices should support, then it's 9180, but we'd typically use 9000 in RA 
to send to devices. 

If we want a higher MTU to be deployable across the Internet, we need to 
make it incrementally deployable. Some key things to achieve that: 

1. Get something like 
https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. 
2. Go to the IETF and get a document published that advises all protocols 
to support PLMTUD (RFC4821) 

1 to enable mixed-MTU lans. 
2 to enable large MTU hosts to actually be able to communicate when PMTUD 
doesn't work. 

With this in place (wait ~10 years), larger MTU is now incrementally 
deployable which means it'll be deployable on the Internet, and IEEE might 
actually accept to standardise > 1500 byte packets for ethernet. 

-- 
Mikael Abrahamsson email: swmike () swm pp se 


Current thread: