nanog mailing list archives

Re: Long-haul 100Mbps EPL circuit throughput issue


From: alvin nanog <nanogml () Mail DDoS-Mitigator net>
Date: Thu, 5 Nov 2015 15:19:12 -0800


hi eric

On 11/05/15 at 04:48pm, Eric Dugas wrote:
...
Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco
2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <->
Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test
machine in customer's VRF

We can full the link in UDP traffic with iperf but with TCP, we can reach
80-90% and then the traffic drops to 50% and slowly increase up to 90%.
 
if i was involved with these tests, i'd start looking for "not enough tcp send 
and tcp receive buffers"

for flooding at 100Mbit/s, you'd need about 12MB buffers ... 

udp does NOT care too much about dropped data due to the buffers,
but tcp cares about "not enough buffers" .. somebody resend packet# 1357902456 :-)

at least double or triple the buffers needed to compensate for all kinds of 
network whackyness: 
data in transit, misconfigured hardware-in-the-path, misconfigured iperfs, 
misconfigured kernels, interrupt handing, etc, etc

- how many "iperf flows" are you also running ??
        - running dozen's or 100's of them does affect thruput too

- does the same thing happen with socat ??

- if iperf and socat agree with network thruput, it's the hw somewhere

- slowly increasing thruput doesn't make sense to me ... it sounds like 
something is cacheing some of the data

magic pixie dust
alvin

Any one have dealt with this kind of problem in the past? We've tested by
forcing ports to 100-FD at both ends, policing the circuit on our side,
called the carrier and escalated to L2/L3 support. They tried to also
police the circuit but as far as I know, they didn't modify anything else.
I've told our support to make them look for underrun errors on their Cisco
switch and they can see some. They're pretty much in the same boat as us
and they're not sure where to look at.



Current thread: