nanog mailing list archives

Re: buffer bloat and packet pacing


From: Brett Frankenberger <rbf+nanog () panix com>
Date: Thu, 3 Sep 2015 09:36:38 -0500

On Thu, Sep 03, 2015 at 01:04:34PM +0100, Nick Hilliard wrote:
On 03/09/2015 11:56, Saku Ytti wrote:
40GE server will flood the window as fast as it can, instead of
limiting itself to 10Gbps, optimally it'll send at linerate.

optimally, but tcp slow start will generally stop this from happening on
well behaved sending-side stacks so you send up ramping up quickly to path
rate rather than egress line rate from the sender side.  Also, regardless
of an individual flow's buffering requirements, the intermediate path will
be catering with large numbers of flows, so while it's interesting to talk
about 375mb of intermediate path buffers, this is shared buffer space and
any attempt on the part of an individual sender to (ab)use the entire path
buffer will end up causing RED/WRED for everyone else.

Otherwise, this would be a fascinating talk if people had real world data.

The original analysis is flawed because it assumes latency is constant.
Any analysis has to include the fact that buffering changes latency.

If you start with a 300ms path (by propogation delay, switching latency,
hetc.), and 375MB of buffers on a 10G port, then, when the buffers
fill, you end up with a 600ms path[1].  And a 375MB window is no longer
sufficient to keep the pipe full.

Instead, you need a 750MB buffer.

But now the latency is 900ms.

And so on.  This doesn't converge.  Every byte of filled buffer is
another byte you need in the window if you're going to fill the pipe.

Not accounting for this is part of the reason the original analysis is
flawed.  The end result is that you always run out of window or run out
of buffer (causing packet loss).

Here's a paper that shows you don't need buffers equal to
bandwidth*delay to get near capacity:
http://www.cs.bu.edu/~matta/Papers/hstcp-globecom04.pdf
(I'm not endorsing it.  Just pointing out it out as a datapoint.)

     -- Brett

[1] 0.300 + 375E6 * 8 / 10E9 = 600ms


Current thread: