nanog mailing list archives

Re: 923Mbits/s across the ocean


From: Iljitsch van Beijnum <iljitsch () muada com>
Date: Tue, 11 Mar 2003 00:41:15 +0100 (CET)


On Sun, 9 Mar 2003, Richard A Steenbergen wrote:

On the send size, the application transmitting is guaranteed to utilize
the buffers immediately (ever seen a huge jump in speed at the beginning
of a transfer, this is the local buffer being filled, and the application
has no way to know if this data is going out to the wire, or just to the
kernel). Then the network must drain the packets onto the wire, sometimes
very slowly (think about a dialup user downloading from your GigE server).

Actually this is often way too fast as the congestion window doubles
with each ACK. This means that with a large buffer = large window and a
bottleneck somewhere along the way, you are almost guaranteed to have
some serious congestion in the early stages of the session and lower
levels of congestion periodially later on whenever TCP tries to figure
out how large the congestion window can get without losing packets.

This is the part about TCP that I've never understood: why does it send
large numbers of packets back-to-back? This is almost never a good idea.

On the receive size, the socket buffers must be large enough to
accommodate all the data received between application read()'s,

That's not true. It's perfectly acceptable for TCP to stall when the
receiving application fails to read the data fast enough. (TCP then
simply announces a window of 0 to the other side so the communication
effectively stops until the application reads some data and a >0 window
is announced.) If not, the kernel would be required to buffer unlimited
amounts of data in the event an application fails to read it from the
buffer for some time (which is a very common situation).

locally. Jumbo frames help too, but their real benefit is not the
simplistic "hey look theres 1/3rd the number of frames/sec" view that many
people see. The good stuff comes from techniques like page flipping, where
the NIC DMA's data into a memory page which can be flipped through the
system straight to the application, without copying it throughout. Some
day TCP may just be implemented on the NIC itself, with ALL work
offloaded, and the system doing nothing but receiving nice page-sized
chunks of data at high rates of speed.

Hm, I don't see this happening to a usable degree as TCP has no concept
of records. You really want to use fixed size chunks of information here
rather than pretending everything's a stream.

IMHO the 1500 byte MTU of ethernet
will still continue to prevent good end to end performance like this for a
long time to come. But alas, I digress...

Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
to support a per-neighbor MTU? This should make backward-compatible
adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
we're at it.)

Iljitsch van Beijnum


Current thread: