nanog mailing list archives

Re: Long-haul 100Mbps EPL circuit throughput issue


From: Greg Foletta <greg () foletta org>
Date: Fri, 6 Nov 2015 10:35:13 +1100

Along with recv window/buffer which is needed for your particular
bandwidth/delay product, it appears you're also seeing TCP moving from
slow-start to a congestion avoidance mechanism (Reno, Tahoe, CUBIC etc).

Greg Foletta
greg () foletta org


On 6 November 2015 at 10:19, alvin nanog <nanogml () mail ddos-mitigator net>
wrote:


hi eric

On 11/05/15 at 04:48pm, Eric Dugas wrote:
...
Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco
2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <->
Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test
machine in customer's VRF

We can full the link in UDP traffic with iperf but with TCP, we can reach
80-90% and then the traffic drops to 50% and slowly increase up to 90%.

if i was involved with these tests, i'd start looking for "not enough tcp
send
and tcp receive buffers"

for flooding at 100Mbit/s, you'd need about 12MB buffers ...

udp does NOT care too much about dropped data due to the buffers,
but tcp cares about "not enough buffers" .. somebody resend packet#
1357902456 :-)

at least double or triple the buffers needed to compensate for all kinds of
network whackyness:
data in transit, misconfigured hardware-in-the-path, misconfigured iperfs,
misconfigured kernels, interrupt handing, etc, etc

- how many "iperf flows" are you also running ??
        - running dozen's or 100's of them does affect thruput too

- does the same thing happen with socat ??

- if iperf and socat agree with network thruput, it's the hw somewhere

- slowly increasing thruput doesn't make sense to me ... it sounds like
something is cacheing some of the data

magic pixie dust
alvin

Any one have dealt with this kind of problem in the past? We've tested by
forcing ports to 100-FD at both ends, policing the circuit on our side,
called the carrier and escalated to L2/L3 support. They tried to also
police the circuit but as far as I know, they didn't modify anything
else.
I've told our support to make them look for underrun errors on their
Cisco
switch and they can see some. They're pretty much in the same boat as us
and they're not sure where to look at.




Current thread: