nanog mailing list archives

Re: high latency ds3 issue on unloaded line


From: chip <chip.gwyn () gmail com>
Date: Fri, 26 Sep 2008 13:01:40 -0400

Mike,

  I've seen issues similar to this when using the 12 Port DS3 cards, Engine
O, in Cisco GSR's.  Basically if there's any single ds3 that is full on any
of the 12 ports, then the buffers on the card fill up and every other port
on that card has their traffic queued, thus introducing latency.  There are
some things that can be done to help with the situation but not much to
actually resolve the issue.  One is to use WRED instead of FIFO on the ints
at the provider side.  A more effective solution, but more invasive, is to
basically set each card's tx queue to 1 so when something is waiting on the
blocked port, packets are dropped instead of queued.

Applied Globally:
cos-queue-group 1-16
  precedence all random-detect-label 0
  random-detect-label 0 1 16 1

Ints w/less than 45Mb
  tx-cos 1-16
  tx-queue-limit 1


Mostly this is just done on the interfaces that are configured with a clock
rate less than 45Mb.  Of course, all this will need to be done on the Qwest
side.  Since you're only see this issue during the day, it points to a
traffic problem.   If qwest needs some proof have them run this command when
you're seeing the high latency:

execute-on slot <slot> show contro frfab queue
execute-on slot <slot> show contro tofab queue

That is, if they're using those cards on a GSR.

Hope that helps.

--chip

On Fri, Sep 26, 2008 at 12:28 PM, Aaron Wendel
<aaron () wholesaleinternet net>wrote:

Have you taken some traffic captures to see what kind of traffic's coming
through?  Could be an infected machine sending lots of small packets from
lots of spoofed addresses.  I've seen that kind of thing cause issues with
older routers before.



-----Original Message-----
From: mike [mailto:mike-nanog () tiedyenetworks com]
Sent: Friday, September 26, 2008 11:04 AM
To: nanog () nanog org
Subject: high latency ds3 issue on unloaded line

Hello,

   I have a ds3 from qwest which has daily issues with insane
point-to-point latencies sometimes exceeding 1000ms for hours on end,
and which suddenly disappear, and does not appear to correspond with
actual measured link utilization (less than 20mbps most days).

   To make a long investigation short, the problem comes on during the
day and then lets up late in the evening. I have tested and examined
everything at the ip layer and no it's not high utilization, an ACL,
router cpu or bad hardware, no line errors or other issues visible from
interface or controller stats. yes I have flushed all hardware, and I
have a 7204vxr/npe-400 with this single ds3. The only clue seems to be
millions of 'output drops' from qwest's side. And at night I can hit
popular ftp mirrors from a directly attached server and observe my
interface reporting about %100 utilization combined with my users and
customers, so yeah it really is a full line rate ds3. And historically
Mrtg always shows around 20mbps or less utilization and it's only
smokeping that goes off, usually in the afternoon when the point to
point latencies between my router and qwest start heading north, and
consistently at that. I also have another in house tool that takes 30
second snapshots of my ds3 interface in order to catch short bursts that
would be smoothed out with mrtg's 5 minute average, but during these
high latency times there aren't any spikes noted. And for added
confusion (or fun!), the latency can start at any utilization level -
I've observed it while we were pulling just 12mbps, and I have not had
it while we were doing 34mbps, only the time of day seems to be the
common factor.

   Qwest has not been able to identify the issue, only note that -
yeah, this really is happening when there is otherwise no real load on
the line - and I am certain we have done everything to rule out the ip
layer. They have put in a 'request' to move me to another router, but I
am not hopeful of a resolution that way as the router we're currently on
doesn't appear otherwise to have the problem with any other subscriber.

   What I want to know, is it possible that the underlaying atm/sonet
that carries my ds3 from my facility is somehow oversubscribed or
misconfigured? We have an OC12 fiber entrance and this is the only
circuit provisioned on it, and in our small tiny town the only other
user on the ring with us is comcast (according to the att network
engineer who installed this). I don't know enough about atm/sonet to
imagine conditions that would cause the issues I am seeing here , but
every ip layer tool I have only ever tells me there isn't an ip issue
here. I can issue ping from my router directly to the attached qwest
router and get > 1000ms and then other times (out of the problem
window), I am getting 4ms.

   If anyone has laughs or beers to offer me, send 'em on cuz I could
use both right about now....

Mike-








-- 
Just my $.02, your mileage may vary,  batteries not included, etc....


Current thread: