Configuring PFC Watchdog

Priority Flow Control (PFC) is a mechanism used in data center networks to ensure lossless transmission for high-priority traffic by pausing traffic when congestion is detected. While PFC helps in managing traffic congestion, it can potentially lead to a situation known as a PFC deadlock. To address this issue, network devices employ a PFC watchdog mechanism to detect and mitigate PFC deadlocks.

Understanding PFC Deadlock

A PFC deadlock occurs when multiple devices in a network are continuously sending PFC pause frames to each other, leading to a situation where traffic is indefinitely paused, causing a complete halt in data transmission. This deadlock can severely impact network performance and application availability.

PFC Watchdog Mechanism

The PFC watchdog is designed to detect and resolve PFC deadlocks. It monitors the duration of PFC pause frames and takes corrective actions if a potential deadlock is detected.

PFC Deadlock Detection and Recovery

By configuring the PFC deadlock detection function, the device can periodically check if it is in a PFC deadlock state. When the device detects a PFC deadlock, it will automatically resolve the deadlock within the recovery period. The system will resume sending traffic for the corresponding priority queue, or it can configure to discard the traffic for the corresponding priority queue. After the recovery period, the normal PFC flow control mechanism will be restored. If a deadlock is detected again during the next detection cycle, a new cycle of deadlock recovery procedures will be initiated.

PFC Deadlock Control Process

If the above deadlock recovery procedures are ineffective and PFC deadlocks continue to occur, users can configure the system to forcibly enter the deadlock control process after a certain number of deadlocks within a specified period. For example, if PFC deadlocks are triggered a certain number of times within a set period, indicating a high risk of frequent deadlocks in the network, the system will enter the deadlock control process. At this point, the device will automatically disable the PFC function to ensure normal packet forwarding and to clear the deadlock state.

After the PFC deadlock state is resolved, users have to restore the PFC deadlock detection function manually. Restoring the PFC deadlock detection function reactivates the PFC feature.

Key Components and Functions of PFC Watchdog

  1. Monitoring PFC Pause Frames: The PFC watchdog continuously monitors the network for PFC pause frames. It keeps track of the duration for which these frames are active for each priority class on each port.

The following command can be used to enable PFC watchdog functionality:

set class-of-service interface <interface-name> pfc-watchdog code-point <cos> enable <true | false>

  1. Timeout Threshold: A configurable timeout threshold is set for PFC pause frames. If a pause frame for a specific priority exceeds this threshold, it indicates a potential deadlock situation.

The detect timer can be configured by the following commands:

set class-of-service pfc-watchdog granularity <10 | 100>

set class-of-service pfc-watchdog code-point <cos> detect-interval <detect-interval>

  1. Deadlock Detection: When the duration of a PFC pause frame surpasses the timeout threshold, the PFC watchdog triggers a deadlock detection process. This process identifies the ports and priority classes involved in the deadlock.

  2. Restore Actions: Once a potential deadlock is detected, the PFC watchdog takes restore actions to break the deadlock. These actions typically include:

  • Forward: During the PFC deadlock recovery process, the received PFC PAUSE frame will be ignored, and the internal scheduler will resume forwarding the traffic.

  • Drop: Drops received data packets.

After a predefined period, the PFC watchdog re-enables PFC for the affected priority class. This period allows the network to stabilize and clear any residual congestion that could cause another deadlock.

The restore action can be configured by the following commands:

set class-of-service pfc-watchdog restore-action <forward | drop>

set class-of-service pfc-watchdog code-point <cos> restore-interval <restore-interval>

  1. Set the Recovery Mode for a Deadlocked Port

The following command can be used to configure the restore mode, the default is automatic recovery.

set class-of-service interface <interface-name> pfc-watchdog restore-mode <manual | auto>

The two different restore modes Manual and Auto represent different deadlock detection processes and different PFC deadlock recovery methods:

Auto

When PFC watchdog functionality is enabled and the restore mode for a port is set to Auto recovery, the PFC watchdog continuously monitors the PFC activity on the port. If deadlocks repeatedly occur and the count exceeds the configured threshold within the specified time period, the system determines that the port is in an unstable state and will automatically disable the PFC feature to prevent further network disruption.

Once PFC is disabled, the port will no longer use PFC to manage congestion until it is manually reset by using the following command:

run clear class-of-service interface <interface-name> pfc-watchdog auto

Manual

When PFC watchdog functionality is enabled and the restore mode for a port is set to Manual recovery, the PFC watchdog continuously monitors the PFC activity on the port, once a PFC deadlock occurs, the PFC function on that port will be automatically disabled.

Once PFC is disabled, the port will no longer use PFC to manage congestion until it is manually reset by using the following command:

run clear class-of-service interface <interface-name> pfc-watchdog manual

This ensures that persistent deadlocks do not degrade network performance.

  1. Set the Maximum Number of PFC Deadlocks within a Specified Period
    When the recovery mode for a port is set to automatic recovery, and the number of PFC deadlocks reaches the upper limit within the specified period, the PFC function on that port will be disabled. In this case, users need to do step 7 to re-enable PFC function.

set class-of-service pfc-watchdog threshold period <time>

set class-of-service pfc-watchdog threshold count <count>

  1. Re-enabling PFC and PFC Watchdog

• When a port experiences a deadlock and the recovery mode is set to manual, this command needs to be run to re-enable the PFC function:

run clear class-of-service interface <interface-name> pfc-watchdog manual

• When the deadlock limit is configured and the port deadlock reaches the upper limit, the reset command needs to be executed to re-enable the PFC function:

run clear class-of-service interface <interface-name> pfc-watchdog auto

Restrictions and Guidelines

When you configure PFC watchdog, follow these restrictions and guidelines:

  • PFC should be enabled on the interface before enabling PFC watchdog.

  • PFC watchdog is only supported on Trident3-X5, Trident3-X7 and Tomahawk3 platforms.

Configuring PFC Watchdog

Procedure

Step 1         Enable PFC on the interface before enabling PFC watchdog.

set class-of-service pfc-profile <pfc-profile-name> [code-point <cos> drop <true | false>]

set class-of-service interface <interface-name> pfc-profile <pfc-profile-name>

Step 2         Enable PFC watchdog. By default, PFC watchdog is disabled.

set class-of-service interface <interface-name> pfc-watchdog code-point <cos> enable <true | false>

Step 3         (Optional) Configure the time interval of PFC deadlock detection. The default detection timer is 1.5 seconds. The value of detection time = granularity x detect-interval.

set class-of-service pfc-watchdog granularity <10 | 100>

set class-of-service pfc-watchdog code-point <cos> detect-interval <detect-interval> (On Trident3 platforms)

set class-of-service interface <interface-name> pfc-watchdog code-point <cos> detect-interval <detect-interval> (On Tomahawk3 platforms)

Step 4         (Optional) Configure the restore time and restore action when PFC deadlock occurs. The default restore action is forward. The restore time = granularity x restore-interval.

set class-of-service pfc-watchdog restore-action <forward | drop>

set class-of-service pfc-watchdog code-point <cos> restore-interval <restore-interval> (On Trident3 platforms)

set class-of-service interface <interface-name> pfc-watchdog code-point <cos> restore-interval <restore-interval> (On Tomahawk3 platforms)

Step 5         (Optional) Set the restore mode for a deadlocked port. If this command is not configured, the default is automatic recovery.

set class-of-service interface <interface-name> pfc-watchdog restore-mode <manual | auto>

Step 6         (Optional) Set the maximum number of PFC deadlocks within a specified period.

When the recovery mode for a port is set to automatic recovery, and the number of PFC deadlocks reaches the upper limit within the specified period, the PFC function on that port will be disabled. In this case, users need to do step 8 to re-enable PFC function.

set class-of-service pfc-watchdog threshold period <time>

set class-of-service pfc-watchdog threshold count <count>

Step 7         Commit the configuration.   

commit

Step 8         (Optional) Re-enable PFC and PFC watchdog.

• When a port experiences a deadlock and the recovery mode is set to manual, this command needs to be run to re-enable the PFC function:

run clear class-of-service interface <interface-name> pfc-watchdog manual

• When the deadlock limit is configured and the port deadlock reaches the upper limit, the reset command needs to be executed to re-enable the PFC function:

run clear class-of-service interface <interface-name> pfc-watchdog auto

Verifying the Configuration

  • After the configuration, use command run show pfc-watchdog config to view the configuration information about PFC watchdog.

admin@PICOS# run show pfc-watchdog config    PORT    ACTION     QUEUE   DETECTION TIME  RESTORATION TIME ----------  -----------   ------------  ----------------  ------------------ te-1/1/25     drop  5             150               150                    6             150               150                    7             120               110
  • Use command run show pfc-watchdog stats to view the statistics information about PFC watchdog, including the number of PFC pause storms that have been detected and restored, as well as the number of packets that have been dropped, on the PFC queues on an interface.

admin@PICOS# run show pfc-watchdog stats QUEUE STATUS STORM DETECTED/RESTORED TX OK/DROP TX LAST OK/DROP ------------ ----------- ------------------------- ---------------- ----------------- te-1/1/25:5 stormed 9/8 82072626556/0 32053822365/0 te-1/1/25:6 stormed 9/8 31504345475/0 32053822365/0 te-1/1/25:7 operational 0/0 0/0 0/0

In the show result,

  • STATUS: The status of PFC watchdog. The value could be operational or stormed.  

    • operational: Currently under detection, no deadlock found.

    • stormed: Currently in a deadlock state.

  • STORM DETECTED: Queue deadlock counter.

  • STORM RESTORED: Queue restore counter.

  • TX DROP and TX LAST DROP: Number of Tx packets dropped due to PFC deadlock.

  • TX OK and TX LAST OK: Number of Tx packets transmitted during deadlock (Forward action).

 

 

 

Copyright © 2024 Pica8 Inc. All Rights Reserved.