nanog mailing list archives

Re: CDN Overload?


From: George Skorup <george () cbcast com>
Date: Tue, 20 Sep 2016 01:14:26 -0500

I have witnessed this issue first hand for several years. Four for sure, maybe five or six. The very first one I remember is a customer doing Usenet downloads and using what he called an "internet download manager" which I assumed was screwing with TCP ACKs. I believe he was a 4Mbps user at the time and this download manager thing was causing 2 to maybe 2.5x his subscribed rate, as Mike says, on the upstream facing router interface. He shut down or uninstalled the software and it stopped. Yes, this customer is on PTMP fixed wireless. Traffic policing was taking place via MikroTik simple queue at the site router.. I could cut his downstream rate in half and it would follow with double still hitting the backhaul. I could also move his queue all the way to the border router and it was still there at double rate.

BTW, we still have this guy as a customer on fixed wireless. He's been on 25/5Mbps for over a year. And we're about to upgrade him to 50/10Mbps with new gear. 25/5 and 50/10 is a far cry from this claimed "slow" WISP service. This shit ain't cheap to get to bumfsck Illinois so farmer Joe can watch porn and his kids can watch Netflix at the same time. Yup, we have slow NLOS service too, because customers decide they want the rural life buried in a mile of trees while "needing" the city benefits. If you want the gigabits, then move outta the sticks. Running a hundred combined miles of fiber to get to 20 customers that want to pay less than $50/mo is not feasible. /rant off

Another time, maybe three years ago, we had a customer on Canopy 5.7 FSK at 4/1Mbps using the built-in QoS. He was watching Netflix and I saw 8Mbps hitting the AP's ethernet interface. I thought the Canopy scheduler was broken. Until I looked deeper and saw that it was working exactly as designed.. with 50% discard rate on his VC. I want to say this was from LLNW at the time. I could be totally wrong about that, I really don't remember.

Now lets move the Windows 10 updates. A 'buried in the sticks' customer on Canopy 900 FSK. 1.5Mbps/384k. Multiple streams from Microsoft and LLNW at the same time. LLNW alone had maybe 10 streams going and was sending at over 15Mbps on average and at worst about 25Mbps... to a 1.5Mbps subscriber. I could throw in a MikroTik queue upstream which only moved the problem as that 15-25Mbps was still hitting backhaul links. And when I have a 100Mbps link going into the site, 25Mbps is a lot.

We've had numerous customers call in for the last month or two with 'teh innernets is down, my phoen wyfy don't work either'. No, your Windows 10 updates are overloading your service. Shut off your PC to use your internet service. Telling a customer those exact words is ridiculous, but we have to do it.

We had a known issue with a particular licensed microwave vendor's radios that we have in use. It was the ethernet buffer becoming saturated at nowhere near the RF link capacity. They put out a new software release and that was resolved. And that was well before this Windows 10 update overload stuff started.

Normal TCP congestion control behavior works perfectly fine. It's not the network. It's the sender not doing normal TCP stuffs. I don't know why the CDNs and/or Microsoft thinks this is a good idea, but to me, it looks like a DDoS. I'm on some of the same lists as Mike and we know of many others reporting similar issues. A couple to the tune of 50-100Mbps overload destined for 5 or 10Mbps tier subscribers. So thanks to Mike for trying to get a conversation going on this topic. And it's not just us red headed step children WISPs.

On 9/19/2016 10:05 PM, Mike Hammett wrote:
http://www.theregister.co.uk/2016/06/08/is_win_10_ignoring_sysadmins_qos_settings/

This explains the recent situations (well, not really an explanation, but a bit more information from other people). 
Not so much for the ones going back a year or two.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: "Mike Hammett" <nanog () ics-il net>
To: "NANOG" <nanog () nanog org>
Sent: Monday, September 19, 2016 12:34:48 PM
Subject: CDN Overload?

I participate on a few other mailing lists focused on eyeball networks. For a couple years I've been hearing complaints from this 
CDN or that CDN was behaving badly. It's been severely ramping up the past few months. There have been some wild allegations, but 
I would like to develop a bit more standardized evidence collection. Initially LimeLight was the only culprit, but recently it has 
been Microsoft as well. I'm not sure if there have been any others.

The principal complaint is that upstream of whatever is doing the rate limiting for a given customer there is significantly 
more capacity being utilized than the customer has purchased. This could happen briefly as TCP adjusts to the capacity 
limitation, but in some situations this has persisted for days at a time. I'll list out a few situations as best as I 
can recall them. Some of these may even be merges of a couple situations. The point is to show the general issue and develop 
a better process for collecting what exactly is happening at the time and how to address it.

One situation had approximately 45 megabit/s of capacity being used up by a customer that had a 1.5 megabit/s plan. All 
other traffic normally held itself within the 1.5 megabit/s, but this particular CDN sent excessively more for extended 
periods of time.

An often occurrence has someone with a single digit megabit/s limitation consuming 2x - 3x more than their plan on the 
other side of the rate limiter.

Last month on my own network I saw someone with 2x - 3x being consumed upstream and they had *190* connections 
downloading said data from Microsoft.

The past week or two I've been hearing of people only having a single connection downloading at more than their plan 
rate.


These situations effectively shut out all other Internet traffic to that customer or even portion of the network for low capacity NLOS areas. 
It's a DoS caused by downloads. What happened to the days of MS BITS and you didn't even notice the download happening? A lot of these 
guys think that the CDNs are just a pile of dicks looking to ruin everyone's day and I'm certain that there are at least a couple 
people at each CDN that aren't that way. ;-)




Lots of rambling, sure. What do I need to have these guys collect as evidence of a problem and who should they send it 
to?




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP




Current thread: