nanog mailing list archives

Re: TWC (AS11351) blocking all NTP?


From: Christopher Morrow <morrowc.lists () gmail com>
Date: Mon, 3 Feb 2014 17:58:21 -0500

wait, so the whole of the thread is about stopping participants in the
attack, and you're suggesting that removing/changing end-system
switch/routing gear and doing something more complex than:
  deny udp any 123 any
  deny udp any 123 any 123
  permit ip any any

is a good plan?

I'd direct you at:
  <https://www.nanog.org/resources/tutorials>

and particularly at:
 "Tutorial: ISP Security - Real World Techniques II"
 <https://www.nanog.org/meetings/nanog23/presentations/greene.pdf>

On Mon, Feb 3, 2014 at 5:16 PM, Peter Phaal <peter.phaal () gmail com> wrote:
On Mon, Feb 3, 2014 at 12:38 PM, Christopher Morrow
<morrowc.lists () gmail com> wrote:
On Mon, Feb 3, 2014 at 2:42 PM, Peter Phaal <peter.phaal () gmail com> wrote:
On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow
<morrowc.lists () gmail com> wrote:
On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal () gmail com> wrote:

There's certainly the case that you could drop acls/something on
equipment to selectively block the traffic that matters... I suspect
in some cases the choice was: "50% of the edge box customers on this
location are a problem, block it across the board here instead of X00
times" (see concern about tcam/etc problems)

I agree that managing limited TCAM space is critical to the
scaleability of any mitigation solution. However, tying up TCAM space
on every edge device with filters to prevent each new threat is likely

yup, there's a tradeoff, today it's being made one way, tomorrow
perhaps a different way. My point was that today the percentage of sdn
capable devices is small enough that you still need a decimal point to
measure it. (I bet, based on total devices deployed) The percentage of
oss backend work done to do what you want is likely smaller...

the folk in NZ-land (Citylink, reannz ... others - find josh baily /
cardigan) are making some strides, but only in the exchange areas so
far. fun stuff... but not the deployed gear as an L2/L3 device in
TWC/Comcast/Verizon.

I agree that today most networks aren't SDN ready, but there are
inexpensive switches on the market that can perform these functions
and for providers that have them in their network, this is an option
today. In some environments, it could also make sense to drop in a
layer switches to monitor and control traffic entering / exiting the
network.

it's probably not a good plan to forklift your edge, for dos targets
where all you really need is a 3 line acl.


The current 10G upgrade cycle provides an opportunity to deploy

by 'current 10g upgrade cycle' you mean the one that happened 2-5 yrs
ago? or somethign newer? did you mean 100G?

I was referring to the current upgrade cycle in data centers, with
servers connected with 10G rather than 1G adapters. The high volumes
are driving down the cost of 10/40/100G switches.

again, lots of cost and churn for 3 lines of acl... I'm not sold.

With integrated hybrid OpenFlow, there is very little activity on the
OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes
handles forwarding of packets. OpenFlow is only used to selectively
override specific FIB entries.

that didn't really answer the question :) if I have 10k customers
behind the edge box and some of them NOW start being abused, then more
later and that mix changes... if it changes a bunch because the
attacker is really attackers. how fast do I change before I can't do
normal ops anymore?

Good point - the proposed solution is most effective for protecting
customers that are targeted by DDoS attacks. While trying to prevent

Oh, so the 3 line acl is not an option? or (for a lot of customers a
fine answer) null route? Some things have changed in the world of dos
mitigation, but a bunch of the basics still apply. I do know that in
the unfortunate event that your network is the transit or terminus of
a dos attack at high volume you want to do the least configuration
that'll satisfy the 2 parties involved (you and your customer)...
doing a bunch of hardware replacement and/or sdn things when you can
get the job done with some acls or routing changes is really going to
be risky.

attackers entering the network is good citizenship, the value and
effectiveness of the mitigation service increases as you get closer to
the target of the attack. In this case there typically aren't very
many targets and so a single rule filtering on destination IP address
and protocol would typically be effective (and less disruptive to the
victim that null routing).


Typical networks probably only see a few DDoS attacks an hour at the
most, so pushing a few rules an hour to mitigate them should have
little impact on the switch control plane.

based on what math did you get 'few per hour?' As an endpoint (focal
point) or as a contributor? The problem that started this discussion
was being a contributor...which I bet happens a lot more often than
/few an hour/.

I am sorry, I should have been clearer, the SDN solution I was
describing is aimed at protecting the target's links, rather than
mitigating the botnet and amplification layers.

and i'd say that today sdn is out of reach for most deployments, and
that the simplest answer is already available.

The number of attacks was from the perspective of DDoS targets and
their service providers.  If you are considering each participant in
the attack the number goes up considerably.

I bet roland has some good round-numbers on number of dos attacks per
day... I bet it's higher than a few per hour globally, for the ones
that get noticed.

A good working definition of a large flow is 10% of a link's
bandwidth. If you only trigger actions for large flows then in the
worst case you would only require 10 rules per port to change how
these flows are treated.

10% of a 1g link is 100mbps, For contributors to ntp attacks, many of
the contributors are sending ONLY 300x the input, so less than
100mbps. On a 10g link it's 1G... even more hidden.

This math and detection aren't HARD, but tuning it can be a bit challenging.

Agreed - the technique is less effective for addressing the
contributors to the attack. RPF and other edge controls should be

note that the focus of the original thread was on the contributors. I
think the target part of the problem has been solved since before the
slides in the pdf link at the top...

applied, but until everyone participates and eliminates attacks at
source, there is still a value in filtering close to the target of the
attack.


http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html

The example can be modified to target NTP mon_getlist requests and
responses using the following sFlow-RT flow definition:

{'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}

or to target DNS ANY requests:

{keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}


this also assume almost 1:1 sampling... which might not be feasible
either...otherwise you'll be seeing fairly lossy results, right?

Actually, to detect large flows (defined as 10% of link bandwidth)
within a second, you would only require the following sampling rates:

your example requires seeing the 1st packet in a cycle, and seeing
into the first packet. that's going to required either acceptance of
loss (and gathering the loss in another rule/fashion) or 1:1 sampling
to be assured of getting ALL of the DNS packets and seeing what was
queried.

The flow analysis is stateless - based on a random sample of 1 in N
packets, you can decode the packet headers and determine the amount of
traffic associated with specific DNS queries. If you are looking at

you're getting pretty complicated for the target side:
  ip access-list 150 permit ip any any log

(note this is basically taken verbatim from the slides)

view logs, see the overwhelming majority are to hostX port Y proto
Z... filter, done.
you can do that in about 5 mins time, quicker if you care to rush a bit.

the traffic close to the target, there may be hundreds of thousands of
DNS responses per second and so you very quickly determine the target
IP address and can apply a filter to remove DNS traffic to that
target.

provided your device does sflow and can export to more than one
destination, sure.

This brings up an interesting point use case for an OpenFlow capable
switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc.
Many top of rack switches can also forward the traffic through a
GRE/VxLAN tunnel as well.

yes, more complexity seems like a great plan... in the words of
someone else: "I encourage my competitors to do this"

I think roland's other point that not very many people actually even
use sflow is not to be taken lightly here either.

-chris

http://blog.sflow.com/2013/11/udp-packet-replication-using-open.html

Domain Name: SFLOW.COM
<snip>
Registry Registrant ID:
Registrant Name: PHAAL, PETER
Registrant Organization: InMon Corp.
<snip>


Current thread: