nanog mailing list archives

RE: packet reordering at exchange points


From: "Kavi, Prabhu" <prabhu_kavi () tenornetworks com>
Date: Tue, 9 Apr 2002 13:58:11 -0400


An interesting historical observation,

Many years ago when I used to create discrete event simulation 
network models for a living, I had one project which was to model 
(what was then) a widely implemented PC TCP stack.  I remember that 
one wart of this implementation was that when packet reordering 
occurred it collapsed the window size to 1!  

Anyone know if strange warts like this still exist in desktop 
systems?

Prabhu
----------------------------------------------------------------------
Prabhu Kavi                     Phone:  1-978-264-4900 x125 
Director, Adv. Prod. Planning   Fax:    1-978-264-0671
Tenor Networks                  Email:  prabhu_kavi () tenornetworks com
100 Nagog Park                  WWW:    www.tenornetworks.com
Acton, MA 01720


-----Original Message-----
From: Iljitsch van Beijnum [mailto:iljitsch () muada com]
Sent: Tuesday, April 09, 2002 12:36 PM
To: Stephen Sprunk
Cc: nanog () merit edu
Subject: Re: packet reordering at exchange points



On Mon, 8 Apr 2002, Stephen Sprunk wrote:

Thus spake "Iljitsch van Beijnum" <iljitsch () muada com>
But how is packet reordering on two parallell gigabit interfaces
ever going to translate into reordered packets for individual
streams?

Think of a large FTP between two well-connected machines.  
Such flows tend
to generate periodic clumps of packets; split one of these 
clumps across two
pipes and the clump will arrive out of order at the other end.  The
resulting mess will create a clump of retransmissions, then 
another bigger
clump of new data, ...

I don't think it will be this bad, even if hosts are 
connected at GigE and
the trunk is 2 x GigE. In this case, a (delayed) ACK will usually
acknowledge 2 segments so it will trigger transmission of two new
segments. These will arrive back to back at the router/switch 
doing the
load balancing. Since there is obviously need for more than 1 
Gbit worth
of bandwidth, it is likely the average queue size is at least 
close to 1
(= ~65% line use) or even higher. If this is the case, there 
is a _chance_
the second packet gains a full packet time over the first and arrives
first at the destination.  However, this is NOT especially 
likely if both
packets are the same size:  the _average_ queue sizes will be 
the same so
in half the cases the first packet gains an even bigger 
advance over the
second, and only in a fraction of half the cases the second 
packet gains
enough over the first to pass it. And then, the destination host still
only sees a single packet coming in out of order, which isn't 
enough to
trigger fast retransmit.

You need to load balance over more than two connections to trigger
unnecessary fast retransmit (over two lines, packet #3 isn't 
going to pass
by packet #1), AND you need to send more than two packets 
back to back.
Also, you need to be at the same speed as the load balanced lines,
otherwise your packet train gets split up by traffic from 
other interfaces
or idle time on the line.

And _then_, if all of this happens, all the retransmitted 
data is left of
window. I'm not even sure if those packets generate an ACK, 
and if they
do, if the sender takes any action on this ACK. If this 
triggers another
round of fast retransmit, the FR implementation should be considered
broken, IMO.

Packets for streams that are subject to header compression or
for voice over IP or even Mbone are nearly always transmitted
at relatively large intervals, so they can't travel down parallell
paths simultaneously.

RTP reordering isn't a problem in my experience, probably 
since RTP has an
inherent resequencing mechanism.

My point is real time protocols will not see reordering 
unless they are
using up nearly the full line speed or there is congestion, 
because these
protocols don't send out packets back to back like TCP 
sometimes does. How
big are VoIP packets? Even with an 80 byte payload you get 
100 packets per
second = 10 ms between packets, which is more than 80 packet times for
GigE = congestion. And if there is congestion, all 
performance bets are
off.

It seems to me spending (CPU) time and money to do more complex load
balancing than per packet round robing in order to avoid 
reordering only
helps some people with GigE connected hosts some of the time. 
Using this
time or money to overcome congestion is probably a better investment.

PS. For everyone looking at their netstat -p tcp output: 
packet loss also
    counts towards the out of order packets, it is hard to 
get the real
    out of order figures.

PS2. Isn't it annoying to have to think about layer 4 to 
build layer 2 stuff?




Current thread: