nanog mailing list archives

Re: Unicast Flooding


From: Steven King <sking () kingrst com>
Date: Wed, 17 Jun 2009 22:39:42 -0400

I wouldn't consider this a defect. Historically L2 and L3 devices have
always been separate. When you get L3 switch those functions are just
combined into one device. In Cisco devices that support CEF, the CEF
table is used to make all forwarding decisions. But the CEF table is
dependent the ARP and Routing tables on the L3 side. When it comes to
forwarding the frame of the proper interface the CAM table comes into
play. If that table is timing out quicker than the L3 tables, there will
be times the CAM table is incomplete.

This is mostly present in redundant gateway setups. In bound traffic is
usually load balanced between the two redundant devices. The gateways
learn about the servers/workstations by traffic leaving the VLAN, not
coming into the VLAN. In the case of HSRP/VRRP the servers/workstations
are only using one of the two redundant devices to send traffic out of
the VLAN. In this case, one device will end up with incomplete
information every 5 minutes (default MAC aging timer). This will cause
traffic coming in to the VLAN (usually load balanced with EIGRP or OSPF)
to be a unknown unicast flood out all ports on the standby device.

Making the L2/3 timers the same corrects this. The reason this corrects
this because, for CEF to make a forwarding decision, it must have the
layer 3 engine make an ARP request if the ARP entry is not present. This
causes an ARP broadcast. With the ARP reply being returned the active
and standby device can both keep their CAM/ARP/CEF tables up to date.

As I do not consider this a defect that these are not synchronized by
default, I do agree it would be very beneficial and prevent a lot of
confusion and hours of troubleshooting when unsuspecting engineers are
trying to figure out why they have a ton of unknown unicast packets.

Just my additional 0.02

Holmes,David A wrote:
In a layer 3 switch I consider unicast flooding due to an L2 cam table timeout a design defect. To test vendors' L3 
switches for this defect we have used a traffic generator to send 50-100 Mbps of pings to a device that does not 
reply to the pings, where the L3 switch was routing from one vlan to another to forward the pings. In defective 
devices the L2 cam table entry expires, causing the 50-100 Mbps unicast stream to be flooded out all ports in the 
destination vlan. In my view the L3 and L2 forwarding state machines must be synchronized such that the L3 forwarding 
continues as long as there are packets entering the L3 switch on one vlan, and exiting the switch on another vlan via 
routing. It seems that gratuitous arps are a workaround which serves to reset the cam entry timeout interval, but not 
an elegant solution.    

-----Original Message-----
From: Matthew Huff [mailto:mhuff () ox com] 
Sent: Wednesday, June 17, 2009 2:58 PM
To: 'Brian Shope'; 'nanog () nanog org'
Subject: RE: Unicast Flooding

Unicast flooding is a common occurrence in large datacenters especially with asymmetrical paths caused by different 
first hop routers (via HSRP, VRRP, etc). We ran into this some time ago. Most arp sensitive systems such as clusters, 
HSRP, content switches etc are smart enough to send out gratuitous arps which eliminates the worries of increasing 
the timeouts. We haven't had any issues since we made the changes.

After debugging the problem we added "mac-address-table aging-time 14400" to our data center switches. That syncs the 
mac aging time to the same timeout value as the ARP timeout 

----
Matthew Huff       | One Manhattanville Rd
OTA Management LLC | Purchase, NY 10577
http://www.ox.com  | Phone: 914-460-4039
aim: matthewbhuff  | Fax:   914-460-4139


  
-----Original Message-----
From: Brian Shope [mailto:blackwolf99999 () gmail com]
Sent: Wednesday, June 17, 2009 5:33 PM
To: nanog () nanog org
Subject: Unicast Flooding

Recently while running a packet capture I came across some unicast
flooding
that was happening on my network.  One of our core switches didn't have
the
mac-address for a server, and was flooding all packets destined to that
server.  It wasn't learning the mac-address because the server was
responding to packets out on a different network card on a different
switch.  The flooding I was seeing wasn't enough to cause any network
issues, it was only a few megs, but it was something that I wanted to
fix.

I've ran into this issue before, and solved it by statically entering
the
mac-address into the cam tables.

I want to avoid this problem in the future, and I'm looking at two
different
things.

The first is preventing it in the first place.  Along those lines, I've
seen
some recommendations on-line about changing the arp and cam timeouts to
be
the same.  However, there seems to be a disagreement on which is
better,
making the arp timeouts match the cam table timeouts, or vice versa.
Also,
when talking about this, everyone seems to be only considering routers,
but
what about the timers on a firewall?  I'm worried that I might cause
other
issues by changing these timers.

The second thing I'm considering is monitoring.  I'd like to setup
something
to monitor for any excessive unicast flooding in the future.  I
understand
that a little unicast flooding is normal, as the switch has to do a
little
bit of flooding to find out where people are.  While looking for a way
to
monitor this, I came across the 'mac-address-table unicast-flood'
command on
Cisco switches.  This looked perfect for what I needed, but apparently
it is
currently not an option on 6500 switches with Sup720s.  Since there
doesn't
appear to be an option on Cisco that monitors specificaly for unicast
floods, I thought that maybe I could setup a server with a network card
in
promiscuous mode and then keep stats of all packets received that
aren't
destined for the server and that also aren't legitimate broadcasts or
multicasts.  The only problem with that is that I don't want to have to
completely custom build my own solution.  I was hoping that someone may
have
already created something like this, or that maybe there is a good
reporting
tool for wireshark or something that could generate the report that I
want.

Anyone have any suggestions on either prevention/monitoring?

Thanks!!

-Brian
    


  

-- 
Steve King

Network Engineer - Liquid Web, Inc.
Cisco Certified Network Associate
CompTIA Linux+ Certified Professional
CompTIA A+ Certified Professional



Current thread: