EtherChannel Buckets

EtherChannel is a great tool for adding redundancy and bandwidth to your infrastructure.  It does come with a few caveats regarding the hash bucket reassignment and packet loss, which doesn't seem to be widely discussed.

At the heart of Cisco's EtherChannel is a hashing algorithm that assigns one frame to one port in a channel.  The hashing algorithm has 8 possible values/buckets (0-7); these are assigned to the physical ports.  When you only have 8 possible results, and the quantity of links in the bundle is not a power of 2, you have unequal load balancing because the result buckets can't be evenly distributed among the member ports.  Here are a few examples of this:

       3 Port EtherChannel:
        Port 1: 3 Hash Result Buckets
        Port 2: 3 Hash Result Buckets
        Port 3: 2 Hash Result Buckets

       6 Port EtherChannel:
        Port 1: 2 Hash Result Buckets
        Port 2: 2 Hash Result Buckets
        Port 3: 1 Hash Result Bucket
        Port 4: 1 Hash Result Bucket
        Port 5: 1 Hash Result Bucket
        Port 6: 1 Hash Result Bucket

It's also important to consider failure scenarios!  If you lose 1 member link of a 4 port EtherChannel, you end up with the uneven 3 port distribution.  This unequal distribution, coupled with the 25% reduced capacity, can really impact production traffic if you're not careful.  Luckily, with LACP, you can assign hot-standby ports to take over for a failed member link to maintain the optimal number of links in a bundle.  This load balancing inequality is being fixed with the introduction of an algorithm update that allows for 256 result buckets.

When ports are added or removed from a port channel (automatically or manually), the result buckets need to be reassigned.  When this happens, all of the buckets are removed from the active ports (causing packet loss), and the new buckets are reassigned.  This happens at the port ASIC level.  The amount of packet loss is platform dependent, because it's dependent on how long that port ASIC update takes.

To demonstrate this loss, I ran a test on a 3750E where I pushed 1 gbps of traffic through a 20gbps (10gbps x 2) EtherChannel, then removed and added links in the bundle.  In order to rule out packet loss induced by the PHY detecting loss of signal, I ran the test by shutting down the port which was not carrying my 1 gbps stream.  With 64 byte frames, I never lost more than 85 frames (which is about .06ms of loss).  While this may not seem like a lot of loss, consider the implications during link flapping problems.

Catalyst 6500s (SXH and newer code) and the Nexus switches now include a feature called adaptive hash distribution.  This adaptive distribution algorithm does not cause packet loss during bucket reassignment.  I hope this feature makes its way down in to the smaller platforms.

No comments:

Post a Comment