nanog mailing list archives

Re: CDN Overload?


From: Martin Hannigan <hannigan () gmail com>
Date: Thu, 22 Sep 2016 19:29:38 -0400

Mike,

I have the right contact there and I'll flag this thread that way in
case they havent already  seen it.

Best,

Martin Hannigan
AS 20940 // AS 32787



On Thursday, September 22, 2016, Mike Hammett <nanog () ics-il net> wrote:

Do we have any contacts at Microsoft that we can talk to about this? This
time around, they are the common denominator. I know people have been
complaining about this for longer than Windows 10 has been out, so there
must be some other reasons why other parties we are to blame.

-----Mike HammettIntelligent Computing SolutionsMidwest Internet
ExchangeThe Brothers WISP

----- Original Message -----
From: Bruce Curtis <bruce.curtis () ndsu edu <javascript:;>>
To: Mike Hammett <nanog () ics-il net <javascript:;>>
Cc: Martin Hannigan <hannigan () gmail com <javascript:;>>, NANOG <
nanog () nanog org <javascript:;>>
Sent: Thu, 22 Sep 2016 16:28:17 -0500 (CDT)
Subject: Re: CDN Overload?


  I have seen traffic from Microsoft in Europe to single hosts on our
campus that seemed to be unusually (high bps) and long.

  I don’t recall if the few multiple hosts I noticed this on over time
were only on our campus wifi.

  If not perhaps the common factor is longer latency?  Both connects over
wireless and connections from Europe to the US would have longer latency.

  Perhaps this longer latency combined with some other factor is
triggering a but in modern TCP Congestion Control algorithms?



This mentions that there have been bugs in TCP Congestion Control
algorithm implementations.   Perhaps there could be other bugs that result
in the descried issue?

https://www.microsoft.com/en-us/research/wp-content/
uploads/2016/08/ms_feb07_eval.ppt.pdf


I have seen cases on our campus where too small buffers on an ethernet
switch caused a Linux TCP Congestion Control algorithm to act badly
resulting in slower downloads than a simple algorithm that depended on
dropped packets rather than trying to determine window sizes etc.  The fix
in that case was to increase the buffer size.  Of course buffer bloat is
also known to play havoc with TCP Congestion Control algorithms.  Just
wondering if some combination of higher latency and another unknown
variable or just a bug might cause a TCP Congestion Control algorithm to
think it can safely try to increase the transmit rate?


On Sep 21, 2016, at 8:29 PM, Mike Hammett <nanog () ics-il net
<javascript:;>> wrote:

Thanks Marty. I have only experienced this on my network once and it was
directly with Microsoft, so I haven't done much until a couple days ago
when I started this campaign. I don't know if anyone else has brought this
to anyone's attention. I just sent an e-mail to Owen when I saw yours.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: "Martin Hannigan" <hannigan () gmail com <javascript:;>>
To: "Mike Hammett" <nanog () ics-il net <javascript:;>>
Cc: "NANOG" <nanog () nanog org <javascript:;>>
Sent: Wednesday, September 21, 2016 8:19:35 PM
Subject: Re: CDN Overload?





Mike,


I will forward to the requisite group for a look. Have you brought this
to our attention previously? I don't see anything. If you did, please
forward me the ticket numbers or message(s) (peering@ is best) so wee can
track down and see if someone already has it in queue.


Jared alluded to fasttcp a few emails ago. Astute man.


Best,


Martin Hannigan
AS 20940 // AS 32787





On Sep 21, 2016, at 14:30, Mike Hammett < nanog () ics-il net
<javascript:;> > wrote:




https://docs.google.com/spreadsheets/d/1Jdm0dOBf81kSnXEvVfI6ZJbWFNt5A
bYUV8CDxGwLSm8/edit?usp=sharing

I have made the anonymized answers public. This will obviously have some
bias to it given that I mostly know fixed wireless operators, but I'm
hoping this gets some good distribution to catch more platforms.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: "Mike Hammett" < nanog () ics-il net <javascript:;> >
To: "NANOG" < nanog () nanog org <javascript:;> >
Sent: Wednesday, September 21, 2016 9:08:55 AM
Subject: Re: CDN Overload?

https://goo.gl/forms/LvgFRsMdNdI8E9HF3

I have made this into a Google Form to make it easier to track compared
to randomly formatted responses on multiple mailing lists, Facebook Groups,
etc.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: "Mike Hammett" < nanog () ics-il net <javascript:;> >
To: "NANOG" < nanog () nanog org <javascript:;> >
Sent: Monday, September 19, 2016 12:34:48 PM
Subject: CDN Overload?


I participate on a few other mailing lists focused on eyeball networks.
For a couple years I've been hearing complaints from this CDN or that CDN
was behaving badly. It's been severely ramping up the past few months.
There have been some wild allegations, but I would like to develop a bit
more standardized evidence collection. Initially LimeLight was the only
culprit, but recently it has been Microsoft as well. I'm not sure if there
have been any others.

The principal complaint is that upstream of whatever is doing the rate
limiting for a given customer there is significantly more capacity being
utilized than the customer has purchased. This could happen briefly as TCP
adjusts to the capacity limitation, but in some situations this has
persisted for days at a time. I'll list out a few situations as best as I
can recall them. Some of these may even be merges of a couple situations.
The point is to show the general issue and develop a better process for
collecting what exactly is happening at the time and how to address it.

One situation had approximately 45 megabit/s of capacity being used up
by a customer that had a 1.5 megabit/s plan. All other traffic normally
held itself within the 1.5 megabit/s, but this particular CDN sent
excessively more for extended periods of time.

An often occurrence has someone with a single digit megabit/s limitation
consuming 2x - 3x more than their plan on the other side of the rate
limiter.

Last month on my own network I saw someone with 2x - 3x being consumed
upstream and they had *190* connections downloading said data from
Microsoft.

The past week or two I've been hearing of people only having a single
connection downloading at more than their plan rate.


These situations effectively shut out all other Internet traffic to that
customer or even portion of the network for low capacity NLOS areas. It's a
DoS caused by downloads. What happened to the days of MS BITS and you
didn't even notice the download happening? A lot of these guys think that
the CDNs are just a pile of dicks looking to ruin everyone's day and I'm
certain that there are at least a couple people at each CDN that aren't
that way. ;-)




Lots of rambling, sure. What do I need to have these guys collect as
evidence of a problem and who should they send it to?




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP







---
Bruce Curtis                         bruce.curtis () ndsu edu <javascript:;>
Certified NetAnalyst II                701-231-8527
North Dakota State University







Current thread: