nanog mailing list archives

Re: TCP time_wait and port exhaustion for servers


From: Ray Soucy <rps () maine edu>
Date: Wed, 5 Dec 2012 12:09:31 -0500

This would be outgoing connections sourced from the IP of the proxy,
destined to whatever remote website (so 80 or 443) requested by the
user.

Essentially it's a modified Squid service that is used to filter HTTP
for CIPA compliance (required by the government) for keep children in
public schools from stumbling on to inappropriate content.

Like most web traffic, the majority of these connections open and
close in under a second.  When we get to a point that there is enough
traffic from users behind the proxy to be generating over 500 new
outgoing connections per second, sustained, we start having users
experience an error where there are no local ports available to Squid
to use since they're all tied up in a TIME_WAIT state.

Here is an example of netstat totals on a box we're seeing the behavior on:

   10 LAST_ACK
   32 LISTEN
    5 SYN_RECV
    5 CLOSE_WAIT
  756 ESTABLISHED
   26 FIN_WAIT1
   40 FIN_WAIT2
    5 CLOSING
   10 SYN_SENT
481947 TIME_WAIT

As a band-aid we've opened up the local port range to allow up to 50K
local ports with /proc/sys/net/ipv4/ip_local_port_range, but they're
brushing up against that limit again at peak times.

It's a shame because memory and CPU-wise the box isn't breaking a sweat.

Enabling TW_REUSE doesn't seem to have any effect for this case
(/proc/sys/net/ipv4/tcp_tw_reuse)
Using TW_RECYCLE drops the TIME_WAIT count to about 10K instead of
50K, but everything I read online says to avoid using TW_RECYCLE
because it will break things horribly.

Someone responded off-list saying that TIME_WAIT is controlled by
/proc/sys/net/ipv4/tcp_fin_timeout, but that is just incorrect
information that has been parroted by a lot on blogs.  There is no
relation between fin_timeout and TCP_TIMEWAIT_LEN.

This level of use seems to translate into about 250 Mbps of traffic on
average, FWIW.




On Wed, Dec 5, 2012 at 11:56 AM, JÁKÓ András <jako.andras () eik bme hu> wrote:
 Ray,

With a 60 second timeout on TIME_WAIT, local port identifiers are tied
up from being used for new outgoing connections (in this case a proxy
server).  The default local port range on Linux can easily be
adjusted; but even when bumped up to a range of 32K ports, the 60
second timeout means you can only sustain about 500 new connections
per second before you run out of ports.

Is that 500 new connections per second per {protocol, remote address,
remote port} tuple, that's too few for your proxy? (OK, this tuple is more
or less equivalent with only {remote address} if we talk about a web
proxy.) Just curious.

Regards,
András



-- 
Ray Patrick Soucy
Network Engineer
University of Maine System

T: 207-561-3526
F: 207-561-3531

MaineREN, Maine's Research and Education Network
www.maineren.net


Current thread: