Equal Cost Multipathing – ECMP

Coarse-grained ECMP

BGP supports equal-cost multipathing (ECMP). If a BGP node hears a particular prefix from multiple peers, it can program the routing table and forward traffic for that prefix through all of these peers. BGP typically chooses one best path for each prefix and installs that route in the forwarding table.

In SONiC, the BGP multipath option works by default. It is set to 64 paths so the switch can install multiple equal-cost BGP paths to the forwarding table and load balance traffic across multiple links. You can change the number of paths allowed according to your needs.

The example changes the maximum number of paths to 100. The value ranges from 1 to 256, and 1 disables the BGP multipath option.

admin@nba610-1:~$ vtysh
nba610-1# co
nba610-1(config)# router bgp 65101
nba610-1(config-router)# address-family ipv4 unicast
nba610-1(config-router-af)# maximum-paths 100
nba610-1(config-router-af)# end
nba610-1# write
Note: this version of vtysh never writes vtysh.conf
Building Configuration...
Configuration saved to /etc/frr/zebra.conf
Configuration saved to /etc/frr/bgpd.conf
Configuration saved to /etc/frr/staticd.conf

When BGP multipath is enabled, only BGP routes from the same AS are load balanced. When the routes span several different AS neighbors, even if the AS path length is the same, they are not load balanced. To load balance between multiple paths received from different AS neighbors, you need to set the bestpath as-path multipath-relax option.

admin@nba610-1:~$ vtysh
nba610-1# co
nba610-1(config)# router bgp 65101
nba610-1(config-router)# bgp bestpath as-path multipath-relax
nba610-1(config-router)# end
nba610-1# write
Note When you disable the bestpath as-path multipath-relax option, EVPN type-5 routes ignore the updated configuration. Type-5 routes will use all available ECMP paths in the underlay fabric, regardless of ASN.

Fine Grained ECMP

Fine-grained equal cost multipathing (FG-ECMP) provides a mechanism to configure consistent and layered hashing.

Note SONiC supports up to 8 FG ECMP groups.

Regular ECMP splits traffic based on flow granularity. It can cause load imbalance across paths resulting in poor utilization of network resources. E.g., if a node fails, ECMP has to rehash all the flows, with most of them changing the resulting next hop. Fine-grained ECMP exposes a large fixed-size ECMP table to the application, allowing remapping only the entries from the failed next hop to the rest of the healthy ones.

Currently, there is no CLI available to configure ECMP. Some commands will be added later.

FG-ECMP relies on the following new tables in the config_db.json file – FG_NHG, FG_NHG_MEMBER, and FG_NHG_PREFIX.
Unless otherwise stated, the attributes are mandatory.

The FG_NHG table fields:

bucket_size: The total desired hash bucket size. The recommended value is the Lowest Common Multiple of 1..{max # of next-hops}
match_mode: The filtering method used to identify when to use fine-grained or regular route handling.
    nexthop-based: Looks to the next hop IP address to filter routes and uses fine-grained ECMP when the next hop IP address matches the FG_NHG_MEMBER IP.
    route-based: Looks to the prefix to filter routes and uses fine-grained ECMP when the route prefix matches the FG_NHG_PREFIX prefix.

The FG_NHG_MEMBER table fields:

bank: An index that specifies a bank or group where the redistribution occurs.
link: A link associated with the next-hop-ip. It enables next hop withdrawal or addition when the link’s operational state changes if configured.
FG_NHG: Reference the FG_NHG table and provide the next hop group name.

The FG_NHG_PREFIX table field:

FG_NHG: fine Grained next-hop group name.

Configuration example

6 Firewalls where each set of 3 firewalls form a group which share state, advertising VIP 10.10.10.10:

Firewall VM set 1 next-hops: 1.1.1.1, 1.1.1.2, 1.1.1.3
Firewall VM set 2 next-hops: 1.1.1.4, 1.1.1.5, 1.1.1.6

Configure Route-based Mode

admin@nba610-1:~$ sudo nano /etc/sonic/config_db.json
{
       "FG_NHG": {
               "2-VM-Sets": {
                       "bucket_size": 12,
                       "match_mode": "route-based"
               }
       },
       "FG_NHG_PREFIX": {
               "10.10.10.10/32": {
                       "FG_NHG": "2-VM-Sets"
               }
       },
       "FG_NHG_MEMBER": {
               "1.1.1.1": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 0,
                       "link": "Ethernet4"
               },
               "1.1.1.2": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 0,
                       "link": "Ethernet8"
               },
               "1.1.1.3": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 0,
                       "link": "Ethernet12"
               },
               "1.1.1.4": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 1,
                       "link": "Ethernet16"
               },
               "1.1.1.5": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 1,
                       "link": "Ethernet20"
               },
               "1.1.1.6": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 1,
                       "link": "Ethernet24"
               }
       }
}
admin@nba610-1:~$ sudo config save -y

Configure Next Hop-based Mode

 admin@nba610-1:~$ sudo nano /etc/sonic/config_db.json
{
       "FG_NHG": {
               "2-VM-Sets": {
                       "bucket_size": 12,
                       "match_mode": "nexthop-based"
               }
       },
       "FG_NHG_MEMBER": {
               "1.1.1.1": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 0,
                       "link": "Ethernet4"
               },
               "1.1.1.2": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 0,
                       "link": "Ethernet8"
               },
               "1.1.1.3": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 0,
                       "link": "Ethernet12"
               },
               "1.1.1.4": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 1,
                       "link": "Ethernet16"
               },
               "1.1.1.5": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 1,
                       "link": "Ethernet20"
               },
               "1.1.1.6": {
                       "FG_NHG": "2-VM-Sets",
                       "bank": 1,
                       "link": "Ethernet24"
               }
       }
}
admin@nba610-1:~$ sudo config save -y

Show the FG-ECMP Configuration

show fgnhg hash-view {fg-nhg-group-name} displays fine-grained ECMP information in a hash view.

admin@nba610-1:~$ show fgnhg hash-view {fg-nhg-group-name}
+-----------------+--------------------+----------------+
| FG_NHG_PREFIX   | Next Hop           | Hash buckets   |
+=================+====================+================+
| 100.50.25.12/32 | 200.200.200.4      | 0              |
|                 |                    | 1              |
|                 |                    | 2              |
|                 |                    | 3              |
|                 |                    | 4              |
|                 |                    | 5              |
|                 |                    | 6              |
|                 |                    | 7              |
|                 |                    | 8              |
|                 |                    | 9              |
|                 |                    | 10             |
|                 |                    | 11             |
|                 |                    | 12             |
|                 |                    | 13             |
|                 |                    | 14             |
|                 |                    | 15             |
+-----------------+--------------------+----------------+

show fgnhg active-hops {fg-nhg-group-name} displays fine-grained ECMP active next hops.

admin@nba610-1:~$ show fgnhg active-hops {fg-nhg-group-name}
+-----------------+--------------------+
| FG_NHG_PREFIX   | Active Next Hops   |
+=================+====================+
| 100.50.25.12/32 | 200.200.200.4      |
|                 | 200.200.200.5      |
+-----------------+--------------------+
| fc:5::/128      | 200:200:200:200::4 |
|                 | 200:200:200:200::5 |
+-----------------+--------------------+
Note {fg-nhg-group-name} is an optional parameter containing the user-defined alias of the FG_NHG group name found in the FG_NHG_PREFIX section of config dB. If specified, the output will display active next hops from the specified group. If it is not specified, by default, active next hops from all groups are displayed.