nanog mailing list archives

Re: Lossy cogent p2p experiences?


From: Tom Beecher <beecher () beecher cc>
Date: Tue, 5 Sep 2023 17:13:01 -0400


Cogent support has been about as bad as you can get.  Everything is great,
clean your fiber, iperf isn’t a good test, install a physical loop oh wait
we don’t want that so go pull it back off, new updates come at three to
seven day intervals, etc.  If the performance had never been good to begin
with I’d have just attributed this to their circuits, but since it worked
until late June, I know something has changed.  I’m hoping someone else has
run into this and maybe knows of some hints I could give them to
investigate.  To me it sounds like there’s a rate limiter / policer defined
somewhere in the circuit, or an overloaded interface/device we’re forced to
traverse, but they assure me this is not the case and claim to have
destroyed and rebuilt the logical circuit.


Sure smells like port buffer issues somewhere in the middle. ( mismatched
deep / shallow, or something configured to support jumbo frames, but
buffers not optimized for them)

On Thu, Aug 31, 2023 at 11:57 AM David Hubbard <
dhubbard () dino hostasaurus com> wrote:

Hi all, curious if anyone who has used Cogent as a point to point provider
has gone through packet loss issues with them and were able to successfully
resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic
locations that have about 52ms of latency.  Mine is set up to support both
jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits,
so they’re not like waves, and that the expected single session TCP
performance may be limited to a few gbit/sec, but I should otherwise be
able to fully utilize the circuit given enough flows.



Circuit went live earlier this year, had zero issues with it.  Testing
with common tools like iperf would allow several gbit/sec of TCP traffic
using single flows, even without an optimized TCP stack.  Using parallel
flows or UDP we could easily get close to wire speed.  Starting about ten
weeks ago we had a significant slowdown, to even complete failure, of
bursty data replication tasks between equipment that was using this
circuit.  Rounds of testing demonstrate that new flows often experience
significant initial packet loss of several thousand packets, and will then
have ongoing lesser packet loss every five to ten seconds after that.
There are times we can’t do better than 50 Mbit/sec, but it’s rare to
achieve gigabit most of the time unless we do a bunch of streams with a lot
of tuning.  UDP we also see the loss, but can still push many gigabits
through with one sender, or wire speed with several nodes.



For equipment which doesn’t use a tunable TCP stack, such as storage
arrays or vmware, the retransmits completely ruin performance or may result
in ongoing failure we can’t overcome.



Cogent support has been about as bad as you can get.  Everything is great,
clean your fiber, iperf isn’t a good test, install a physical loop oh wait
we don’t want that so go pull it back off, new updates come at three to
seven day intervals, etc.  If the performance had never been good to begin
with I’d have just attributed this to their circuits, but since it worked
until late June, I know something has changed.  I’m hoping someone else has
run into this and maybe knows of some hints I could give them to
investigate.  To me it sounds like there’s a rate limiter / policer defined
somewhere in the circuit, or an overloaded interface/device we’re forced to
traverse, but they assure me this is not the case and claim to have
destroyed and rebuilt the logical circuit.



Thanks!


Current thread: