nanog mailing list archives

Re: Latency/Packet Loss on ASR1006


From: Colin Legendre <clegendre () coextro com>
Date: Thu, 9 Dec 2021 19:17:05 -0500

Thanks for this.. turned off netflow export.. and it dropped our qfp load
from 44% to 18%.  ugh..

---
Colin Legendre



On Thu, Dec 9, 2021 at 4:22 AM Brian Turnbow via NANOG <nanog () nanog org>
wrote:



On 11/26/2021 1:09 PM, Colin Legendre wrote:
Hi,

We have ...

ASR1006  that has following cards...
1 x ESP40
1 x SIP40
4 x SPA-1x10GE-L-V2
1 x 6TGE
1 x RP2

We've been having latency and packet loss during peak periods...

We notice all is good until we reach 50% utilization on output of...

'show platform hardware qfp active datapath utilization summary'

Literally ... 47% good... 48% good... 49% latency to next hop goes
from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
latency... 53% we see 60-70ms latency and 8-10% packet loss.

Is this expected... the ESP40 can only really push 20G and then starts
to have performance issues?


He had a similar issue about 4 years ago.
We were showing packet loss and drops getting progressively worse and the
router was falling over when reaching about 70% of usage.
We could see the interface reliability go down and input errors due to
overruns on the interfaces.
Cisco blamed it on microburtst not being able to be handled under load.


"We were able to replicate this scenario in our lab as well.
QFP under high load generated input errors and overruns which in turn led
to unicast failures/ drops/ latency.
The issue is not consistent with QFP % utilization as sometimes with even
80%+ traffic, we  do not see the drops:"

And recommended removing traffic or upgrading esp.

One of our guys disabled nbar on the router and the problem disappeared.
I would suggest taking a look at what features you are using and if you
can try and disable them to see if it makes any impact.
We then upgraded esps and all has been fine since.

Brian



Current thread: