/
Principle of MLAG

Principle of MLAG


Definition

MLAG (Multi-chassis Link Aggregation Group) as the name suggests, deploys LAG (Link Aggregation Group) technology to different member ports on a pair of devices which appear to be on a single device to the downstream third device in Layer 2. The figure below shows the physical topology and the logical topology of the MLAG network in Layer 2. The two MLAG peer devices, SwitchA1 and SwitchA2, maintain communication by exchanging MLAG control plane messages and MAC address learning of the LAG interface to ensure MAC synchronization using L2 multicast packets. The downstream device could be any endpoint equipment (L2 switch or server) that supports LACP Link Aggregation technology. It won’t get a feel that there are two devices linked with it at the other end of the link when dual-homing to the network through the MLAG peer devices.

Figure 1. Physical Topology and Logical Topology of the MLAG Networking

MLAG is mainly applied in scenarios where a downstream switch or host has to or needs to dual-access to the network. In Figure 1, before deploying MLAG, suppose SwitchB single-accesses to the network through SwitchA1 when spanning tree is enabled. If SwitchA1 device fails or the link fails, SwitchB fails to communicate with the network. By using MLAG, the downstream switch or host can dual-access to the network through SwitchA1 and SwitchA2 which enables link-level and device-level redundancy and protection.

This provides redundancy by giving the downstream switch or host two uplink paths as well as full bandwidth utilization since the MLAG domain appears to be a single switch to Spanning Tree Protocol (STP). So, there are no blocked ports as the MLAG domain appears to STP as a single switch.

As MLAG has the following advantages, it can be used to build a highly resilient and highly reliable Layer 2 network.

  • Increased Bandwidth

MLAG aggregates multiple Ethernet ports across two switches, this increases the uplink bandwidth. The maximum bandwidth of the link aggregation interface can reach the sum of the bandwidths of individual MLAG member ports.

  • Higher Reliability

Dual-working mechanism to ensure high reliability. When a link or device fails, traffic can be switched to the other available member links or device to improve the reliability of the MLAG domain.

  • Load Balancing

       In an MLAG domain, you can achieve load balancing on each active aggregation interface link.

Basic Concepts

  • MLAG domain and domain ID

MLAG domain defines the topology range of the MLAG calculations and control. An MLAG domain includes a pair of MLAG peer switches, the MLAG peer-link and the MLAG member ports. The MLAG domain ID is a unique identifier for an MLAG domain, which should be configured identically on each MLAG peer device in the same MLAG domain.

Currently, only one MLAG domain is allowed to be configured on one MLAG device. A pair of MLAG peer devices can be connected to different third-party devices to form different MLAGs. An MLAG domain can hold multiple MLAGs.

Figure 2 shows an MLAG domain with multiple MLAGs, where Switch1, Switch2 and the MLAG member ports connected to Switch3 form an MLAG1; Switch1, Switch2 and the MLAG member ports connected to Switch4 form another MLAG2.

Figure 2. Multiple MLAGs Network

Use the run show mlag domain command to view the MLAG domain information:

admin@Xorplus# run show mlag domain summary

Domain ID: 1    Domain MAC: 48:6E:73:FF:00:01    Node ID: 0
----------------------------------------------------------------------------------------------
Peer Link  Peer IP          Peer Vlan  Neighbor Status    Config Matched       MAC Synced  # of Links
---------  ---------------  ---------  ---------------    --------------      ----------  ----------
ae23       1.1.1.2          4088        ESTABLISHED         Yes                Yes          2

NOTE:

  • MLAG domain ID is required to be unique within the Layer 2 network.
  • The maximum number of MLAG interfaces/ports supported by the system is subject to the maximum number of LAGs supported by the switch. The maximum number of LAGs supported by each model is described in the command reference set interface aggregate-ethernet <lag_name>, the link is Collection of Feature Specification of Different Platforms.
  • MLAG domain MAC

Each MLAG domain has a unique domain ID which should be different between different MLAG domains. Once configured, both MLAG peer devices use the MLAG domain ID to automatically produce a unique MLAG domain MAC address which is defined as 48:6E:73:FF:00:<MLAG domain ID in hexadecimal>. For example, if the MLAG domain ID is 12, then the corresponding MLAG domain MAC address would be 48:6E:73:FF:00:0C.

MLAG domain MAC address is identical on both MLAG peer devices, it is used by LACP as part of system ID and by STP as part of bridge ID to communicate with other L2 devices. Use the command run show mlag domain {<domain-id>| summary} to show the MLAG domain information which includes the MLAG domain MAC.

  • MLAG peer

MLAG peer devices are a pair of switches that enables the MLAG function, which are defined as MLAG Node 0 or Node 1. Users have to use the CLI command set protocols mlag domain <domain-id> node <0 | 1> to specify the Node ID for the MLAG peer devices. If one of the MLAG peer devices is configured as Node 0, the other one should be configured as Node 1. The two nodes are all active, providing a reliable dual-access to the network for the MLAG access device.

The two nodes function equally and are not distinguished as master or slave. In most application scenarios, the two nodes have no difference, except for the following two cases:

  1. The single-homed port uses the original port ID on Node 0 peer device, however, an offset 1024 is added to the Port Index as a new port ID on Node 1.
  2. The MLAG member ports use the original port ID on Node 0 peer device, however, an offset 512 is added to the Link ID as a new port ID on Node 1.

We can see the port ID information in the display of LACP/STP related show command and BPDU packets.

  • MLAG peer link

MLAG peer link is the direct link between MLAG peer devices, used for transmitting part of the data traffic, MLAG state and MLAG control plane messages. Use the set protocols mlag domain <domain-id> peer-ip <peer-ipv4-address> peer-link <peer-interface-name> command to configure the remote peer-link port IP and the local peer-link interface. The interfaces directly connected to the two ends of the peer-link are peer-link ports.

A specified VLAN MUST be assigned to the peer-link interface, MLAG peer VLAN, which is dedicated to transmitting MLAG control plane messages and not transmitting data messages. Peer VLAN is always set to forwarding in order to allow MLAG information negotiation between MLAG peers. The following CLI commands is used to configure MLAG peer VLAN, the recommended value is 4088.

set protocols mlag domain <domain-id> peer-ip <peer-ipv4-address> peer-vlan <vlan-id>

If peer-link is down for any reason, MLAG control plane messages cannot be exchanged properly, causing the MLAG system to operate abnormally. Especially when peer-link is down, but both the MLAG member ports are up, the split-brain failure scenario occurs. The system cannot be automatically recovered in this scenario.

Therefore, to ensure the reliability of peer-link, note the following points when configuring and deploying peer link:

  1. Only one peer link connecting the two peer devices is allowed in an MLAG domain.
  2. When configuring the peer link, only one LAG port can be used as peer link.
  3. Use a LAG port with at least two directly connected physical ports to guarantees reliable communication between the peer devices on the peer link. Use of any intermediate transmission device between the two peer devices on the peer link is not allowed. All of the directly connected physical ports should be added into one LAG port to form the peer-link. We don’t support more than one L2 connection between MLAG peer switches.
  4. 10G or 40G speed ports should be used for peer link to enough bandwidth is provided when the network is deployed.
  5. Any manual action to shut down the peer link is strictly forbidden.
  6. Any MLAG VLAN and non-MLAG VLAN traffic MUST be allowed on MLAG peer-link.
  7. When numerous rapid PVST+ instances are configured, exceeding the default BPDU queue processing rate in CPU will result in BPDU packets loss or network loops. To resolve this problem, you can use the following CoPP command to increase the maximum bandwidth of BPDU queue. The default value is 80pps.

   set class-of-service scheduler bpdu-scheduler max-bandwidth-pps <value>

8. When numerous MLAG instances are configured, exceeding the default MLAG queue processing rate in CPU will result in MLAG control packets loss. To resolve this problem, you can use the following CoPP command to increase the maximum bandwidth for MLAG and MLAG MAC SYNC queues. The default value is 80pps.

   set class-of-service scheduler mlag-scheduler max-bandwidth-pps <value>

   set class-of-service scheduler mlag-mac-sync-scheduler max-bandwidth-pps <value>

NOTE:

  • When spanning tree protocol is enabled, the peer link port is always in forwarding state and won’t participate in the spanning tree calculation after peer link is established.
  • It is strongly recommended to use LACP protocol when configuring the peer link port.
  • MLAG member port

MLAG member port is the LAG port on the MLAG peer devices that interconnects to the downstream device.

Usually, we configure MLAG member ports on the MLAG peer devices with the same LAG ID to form an MLAG. However, this is not required.

We have to bind the MLAG member port to the MLAG link ID. The paired MLAG member ports of the same MLAG must be bound to the same MLAG link ID. Different MLAGs are identified by different link IDs. For example we have two MLAGs in an MLAG Domain then link ID 1 could be used to identify one MLAG while link ID 2 could be used to identify the other MLAG in the MLAG Domain.

After all the MLAG configurations are finished, MLAG peer devices send MLAG control plane messages to each other to determine an MLAG pair. Upon receiving the MLAG Control message from the peer device, the local device determines whether the link ID carried in the MLAG Control message is the same as that of the local. If the link IDs configured on the two devices are the same, the two devices make an MLAG pair successfully.

User can use command run show mlag link to show the information about each MLAG and the MLAG member ports status.

admin@XorPlus# run show mlag link summary
# of Links: 2
Link   Local LAG   Link Status   Local Status   Peer Status   Config Matched   Flood
----   ---------   -----------   ------------   -----------   --------------   -----
1      ae1         IDLE          UP             UNKNOWN       No               No  
2      ae2         IDLE          UP             UNKNOWN       No               No

Figure 3. MLAG Member Port

When accessing the MLAG domain, the access devices are required to support LAG protocol. As shown in Figure 3, SwitchB is required to configure a LAG interface to interconnect to the MLAG member ports.

NOTE:

It is strongly recommended to use LACP protocol when configuring the LAG interface.

MLAG State Machine

The MLAG state machine describes the state of the MLAG peer link and the MLAG member ports on the local device and the remote peer device. The MLAG state machine facilitate link fault detection and recovery. The system defines MLAG neighbor state and MLAG interface state to establish peer link and different MLAGs configured in this MLAG Domain.

MLAG uses the TCP protocol for reliably transmitting the MLAG control messages between the two peer devices to exchange the MLAG state change. The system changes the state based on the local MLAG state and the received peer MLAG Control message. You can view the MLAG interface state, and MLAG neighbor state by using related show commands.

MLAG Neighbor State

MLAG neighbor state shows the global status of MLAG peer device and peer-link, including the following values:

  • IDLE: The initial state of the global neighbor state machine when MLAG peer-link is configured.
  • CONNECTING: The peer-link ports are up. The peer-link connection is started. Both MLAG peers try to setup a TCP connection to each other.
  • ESTABLISHED: This state indicates that peer-link connection between the MLAG peer devices is established, the peer session and neighbor relationship is setup.

You can use the run show mlag domain {<domain-id>| summary} command to view the MLAG peer-link configuration information and the neighbor state. For example,

admin@Xorplus# run show mlag domain summary
Domain ID: 1    Domain MAC: 48:6E:73:FF:00:01    Node ID: 0
----------------------------------------------------------------------------------------------
Peer Link  Peer IP          Peer Vlan  Neighbor Status    Config Matched       MAC Synced  # of Links
---------  ---------------  ---------  ---------------    --------------      ----------  ----------
ae23       1.1.1.2          4088        ESTABLISHED         Yes                Yes          2

MLAG Interface State

MLAG interface state defines the status of peer link and MLAG member port, including the following values:

  • INIT: The initial state of MLAG, MLAG is disabled and no information is exchanged in this state.
  • IDLE: In this state, peer-link is configured, MLAG peer device initiates a TCP connection with the peer and changes its state. However, the peer-link session has not been established, MLAG link state switches from INIT to IDLE.
  • DOWN: In this state, peer-link session is established, that is, the MLAG neighbor state is ESTABLISHED, but the MLAG member port is not configured on the MLAG peer device. If the local MLAG member port is down, then the MLAG interface state is DOWN.
  • STANDBY: In this state, peer-link session is established, that is, the MLAG neighbor state is ESTABLISHED, but the MLAG member port is not configured on the MLAG peer device. If the local MLAG member port is up, then the MLAG interface state is STANDBY.
  • AS_DOWN: In this state, peer-link session is established, that is, the MLAG neighbor state is ESTABLISHED. MLAG member ports are configured on both MLAG devices. If the MLAG member ports on both sides are down, the MLAG interface state is AS_DOWN.
  • AS_PEER: In this state, peer-link session is established, that is, the MLAG neighbor state is ESTABLISHED. MLAG member ports are configured on both MLAG devices. If the local MLAG member port is down but peer MLAG member port is up, then the MLAG interface state is AS_PEER.
  • AS_LOCAL: In this state, peer-link session is established, that is, the MLAG neighbor state is ESTABLISHED. MLAG member ports are configured on both MLAG devices. If the local MLAG member port is up but peer MLAG member port is down, then the MLAG interface state is AS_LOCAL.
  • FULL: Peer session is established and MLAG member ports on both peer devices are up.

In brief, it can be summarized as the following table:

MLAG Interface State

Peer link session is established

Peer MLAG member port is configured

Local MLAG member port is up

Peer MLAG member port is up

INIT

-

-

-

-

IDLE

-

-

-

-

DOWN

-

-

-

STANDBY

-

-

AS_DOWN

-

-

AS_PEER

-

AS_LOCAL

-

FULL

You can use the run show mlag link {<link-id>| summary} command to view the state of the MLAG interface. For example,

admin@XorPlus# run show mlag link summary
# of Links: 2
Link   Local LAG   Link Status   Local Status   Peer Status   Config Matched   Flood
----   ---------   -----------   ------------   -----------   --------------   -----
1      ae1         IDLE          UP             UNKNOWN       No               No  
2      ae2         IDLE          UP             UNKNOWN       No               No

In the output, Link Status shows the MLAG interface state, Local Status shows the status of local MLAG member port.

MLAG Control Plane Messages

The MLAG provides MLAG control plane messages, which is used to transmit the following information between the MLAG peer devices:

  • MLAG state information.
  • Synchronization information (including STP information synchronization and multicast control information synchronization).
  • Configuration consistency check.

The MLAG control plane messages can be divided into two categories: L2 and TCP packets.

  • For L2 packet, the destination MAC is 01:80:C2:00:00:0F and EtherType is 0x6666.
  • For TCP packets, the destination port is 0xE290.

The format of MLAG control plane messages common header is:

Field      

Descriptions

Version

This field specifies the MLAG version. Currently, the version is 0x1.

Type

This field specifies the type of MLAG control plane messages.

  • 0x1 indicates MLAG Control message.
  • 0x2 indicates MAC Sync message.
  • 0x3 indicates STP Sync message.
  • 0x5 indicates Multicast Control Sync message.
  • 0x6 indicates Configuration Consistency message.

MLAG control message includes the following four types and MAC Synchronization message:

  • MLAG Control message

The MLAG Control message is used to maintain the MLAG status.

MLAG device sends an MLAG Control message under the following conditions:

           1.  MLAG neighbor state changes to ESTABLISHED.

           2.   Any MLAG interface state changes.

MLAG Control messages are encapsulated and transmitted via TCP protocol.

  • STP Sync

The STP Sync message is used to sync up STP dynamic information, such as the calculated root priority and link cost from the received BPDUs to the peer switch. The STP Sync message is encapsulated and transmitted via TCP protocol.

  • Multicast Control Sync

The Multicast Control Sync message is used to sync up IGMP/PIM dynamic information from the received IGMP/PIM message to the peer switch. The Multicast Control Sync message is encapsulated and transmitted via TCP protocol.

IGMP sees the MLAG LAG link as a unique logical link, so IGMP packets are synced between the MLAG peer devices through peer-link by Multicast Control Sync message:

IGMP packet received by either of the MLAG peer switches from MLAG port is synced to the other peer switch through peer link as if it is received by local MLAG port.

  • Configuration Consistency

The Configuration Consistency message is used to check the MLAG related configuration consistency between MLAG peers. The Configuration Consistency message is encapsulated and transmitted via TCP protocol.

MLAG device sends a Configuration Consistency message under the following conditions:

1.  A new MLAG related configuration is committed.

2.  MLAG neighbor state changes to ESTABLISHED.

MAC Synchronization

In order to ensure that the traffic of the same user can be forwarded normally at both ends of the MLAG peer device, the MAC address table on both peer devices needs to be consistent with each other. This is accomplished by MAC synchronization mechanism which sends MAC synchronization message that is transferred by L2 multicast packets with destination address 01: 80: c2: 00: 00: 0f. MD5 checksum is added to the message to ensure that the MAC address table is correctly synchronized.

Meanwhile, in order to control bandwidth consumption of the MLAG peer link caused by flooding of unknown unicast traffic, the MLAG peer switches should synchronize MAC address table with each other.

Only when both of the following two conditions are satisfied, the MAC Sync message will be sent:

  • MLAG neighbor state changes to ESTABLISHED.
  • There is a change in the MAC table.

There are three types of MAC addresses defined in MLAG: Static, Dynamic, and Peer-Sync, where Peer-Sync represents the dynamic MAC address synchronized from the MLAG peer device, and its priority is lower than that of static MAC. If one of the MLAG peer switch fails, the Peer-Sync MAC address on the other switch will be deleted from the MAC address table.

Static and learned MAC addresses from any port except the peer link port are synced to MLAG peer switch through peer link. The MLAG peer’s system MAC address which is learned on peer link is internally configured as static MAC address. New learned MAC addresses are immediately synced to the peer switch.

  • How to update the MAC table with synced MAC addresses:
    • The MAC addresses learned on the single-homed port are synced to peer link port of the peer switch on the peer link.
    • The mac addresses learned on the MLAG member port are synced to the respective MLAG member port of the peer device through the MLAG peer link.
    • System MAC will be synchronized to the peer switch MLAG peer-link port as a static MAC address.
    • The MAC addresses learned on the peer link port are not synced.
  • How to define the type of the MAC addresses:
    • If a MAC address is not statically configured but only learned on local MLAG switch, it is marked as “Dynamic” on local switch and “Peer-Sync” on peer switch.
    • If a MAC address is not statically configured but learned on both MLAG peer switches, it is marked as “Dynamic” on Node 0 switch and “Peer-Sync” on Node 1 switch.
    • Static MAC address has a higher priority so that it is not overridden by “Dynamic” and “Peer-Sync” MAC, but can override the “Dynamic” and “Peer-Sync” MAC types .
    • Static MAC addresses are not synced automatically, they should be synced manually.
    • If a static MAC address bound to a single-homed port is configured on one MLAG device, the static MAC address entry should be manually configured to bind to the peer-link interface on peer switch.
    • If a static MAC address bound to an MLAG member port is configured on one MLAG device, the static MAC address entry should be configured to bind to the MLAG member port on peer switch.
  • How the MAC addresses age out:

If the MAC addresses (Dynamic or Peer-Sync) age out or are cleared by CLI command on one of the MLAG peer devices, it is synced to the peer switch and removed from the peer switch as well.

You can use the run show mac-address table command to view the information about MAC address table, such as MAC address statistics, VLAN ID, MAC address, MAC address type and outbound interface.

For example,

Figure 4. A MAC Sync Example

When showing the MAC table on Switch A and Switch B, we can see that the dynamic MAC entry learned from the MLAG member port will be synchronized to the corresponding MLAG member port on the peer device, and dynamic MAC learned from the single-homed port will be synchronized to the peer-link port on the peer device.

admin@SwitchA# run show mac-address table
Total entries in switching table:   3
Static entries in switching table:  0
Dynamic entries in switching table: 3 

VLAN      MAC address           Type         Age      Interfaces         User
----      -----------------     ---------    ----     ----------------   ------
1         08:9e:01:61:64:13     Dynamic      300      ge-1/1/2           xorp
1         cc:37:ab:4f:ad:01     Peer-Sync    300      ae1                xorp
4088      8c:ea:1b:88:5b:81     Static       300      ae3                xorp 

admin@SwitchB# run show mac-address table
Total entries in switching table:   3
Static entries in switching table:  0
Dynamic entries in switching table: 3

VLAN      MAC address           Type         Age     Interfaces         User
----      -----------------     ---------    ----    ----------------   ------
1         08:9e:01:61:64:13     Peer-Sync    300     ae3                xorp
1         cc:37:ab:4f:ad:01     Dynamic      300     ae1                xorp
4088      8c:ea:1b:88:5b:82     Static       300     ae3                xorp

When VXLAN is deployed in an MLAG domain, MAC sync between MLAG peer devices is different.

As shown in the following figure, the switches on the access side, SwitchC and SwitchD, are dual-homed to an MLAG domain. At the same time, a VXLAN tunnel is established between MLAG peer device SwitchA and SwitchB, so that Layer 2 devices on the access side can communicate over Layer 3 networks.

Figure 5. MLAG Topology with VXLAN