Unknown Unicast Flooding has been a nagging problem in networks that have asymmetric routing and default timers for a long time. It occurs when the router needs to deliver a packet; it has an ARP entry for a destination host, but the switch has no CAM entry. The result is a packet that's flooded to all of the ports in the VLAN.
The default (IOS) ARP timeout is 14,400 seconds (4 hours) and the default CAM timer is usually 300 seconds. In the 6500, with SXI code, the CAM aging timer has been increased to 480 seconds.
The easiest fix is to make the ARP timeout equal to the CAM timeout. This allows the CAM table to be updated when the ARP entry is refreshed, so they shouldn't get out of sync:
int vlan [VLAN NUMBER]
arp timeout 300
If your L2 domains are very large, and you're concerned with too much ARP traffic, change the CAM timer:
mac-address-table aging-time 14400 [VLAN NUMBER]
I prefer to change the ARP timer because it only needs to be done on the L3 interfaces supporting the VLAN. If you change the CAM timer, you should change it on all of the switches in the VLAN.
For extra protection, you can also look at implementing:
- Unknown Unicast Flood Blocking (UUFB)
switchport block unicast
- Unknown Unicast Flood Rate-Limiting (UUFRL) (Available on the PFC3C)
mls rate-limit layer2 unknown rate-in-pps [burst-size]
- CAM table synchronization between the DFC and PFC
mac-address-table synchronize
Luckily, with the Nexus 7000 and NX-OS, they had a clean slate. The default ARP timeout is 1500 seconds, and the CAM aging is 1800 seconds.
For more information, here is a good resource:
http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/blocking.pdf
Woohoo, comments!
ReplyDeleteI found out about this issue when we installed some new server switches on our network and looked at the uplink interface statistics. We had spikes overnight of close to 100,000 PPS coming in - and the switch had NO devices on it!
I'm betting that a lot of organizations have unicast flooding problems, but their traffic levels are low enough that it goes unnoticed.
ReplyDeletePat yourself on the back for finding it before it was too late!
This document has a nice description of the underlying problem with L2/L3 timers and routing asymmetry:
ReplyDeletehttp://www.cisco.com/en/US/docs/solutions/Enterprise/Campus/HA_campus_DG/hacampusdg.html#wp1108782
James, very nice article about unicast flooding. However, i have always wondered about the following: Cisco recommends sometimes to put the MAC & ARP timers equal. But i am not sure this would eliminate flooding completely. I have always wondered what happens when an ARP entry times out: does the switch timeout the ARP entry at timeout and then re-ARPs ? And what happens in the moment when the switch is waiting for the ARP reply ? Does he drop packets until the ARP reply is received or flood ? I would re-ARP one second before the ARP entry really expires for example, but i don't know if Cisco IOS switches do this also :-)
ReplyDeleteI tried to use some debugs to get a specific answer, but 9 debug events show up with 2 different timestamps (down to the msec).
ReplyDeleteThe order of a few things (with identical timestamps) had me scratching my head and I started to question whether the printed order correlates to IOS processing.
>I would re-ARP one second before
>the ARP entry really expires
If this is an issue, my debugs suggest you'll need more than a 1s buffer. Here are two timestamps from the start of the arp refresh process (10s timeout), for 1 IP:
09:50:27.040
09:50:38.304
I would like to lab this up.
sorry to revive this old thread...
ReplyDeleteI had a unknown unicast flooding problem recently.
when trying to hook up a nas with bonded interfaces to two uplink switches a UUF situation during load testing.
why it happened is still unclear although I suspect the loadbalancing algorithm which maybe created an assymetric trafic flow.
now the cisco related question:
I can change the mac address global aging time on a catalyst switch.
but how can I check the current aging time for learned mac addresses?
Eric: "show mac address-table" will display the age for an entry.
ReplyDeletethat would work on a 6500 platform
ReplyDeletebut it seems 2960's do not give the age detail per learned mac address
Yes, that seems to be correct. I don't see the command to do this in a 2960. I wouldn't be surprised if the information buried in a "show platform mac-address-table" command.
ReplyDeleteTo elaborate more on the buffer time between arp timeout and mac aging time. If you have 6509s with DFC cards, then you have additional considerations. The DFCs synchronize their CAM tables at a default interval of 160 seconds. This means that you have a lag time of when the MAC address is learned by a DFC and when it is synchronized to the other DFCs and the SP.
ReplyDeleteBeyond that, if you have a busy system the replication can actually take multiple synchronization cycles before it is successful.
Given that, the recommendation is to set the MAC aging time to (ARP + 3(DFC sync time)).
Given a default config this translates to:
(14400 + 3(160)) = 14880
Show commands:
sh mac address-table synch stat (shows the timers, etc)
sh mod | i DFC (shows which modules are DFCs)
Config command:
mac address-table synch activity-time