Overview of Traffic Steering 

The Versa SD-WAN solution provides powerful and flexible mechanisms for realizing a wide variety of wan path selection behaviors. 

 

Versa branches use active monitoring to gather performance metrics on all paths, towards all peer branches. The metrics include latency, jitter, and loss. They are used to determine whether a particular path is up, and if so, whether it meets user defined SLAs for a given application or set of applications. Traffic for the applications in question is only sent on SLA compliant paths.

 

In addition, Versa software has support for advanced traffic management capabilities to mark, rate limit, prioritize and shape traffic. Traffic management and path selection capabilities together ensure SLA compliance for mission critical traffic.


SLA Monitoring

Versa branches continuously monitor the performance of all paths towards all SDWAN peer branches. A branch to branch path is defined as any valid transport tunnel between the two branches. For example, if two branches have two broadband links each, and all of them are in a single transport domain, there are four paths between the branches. 

 

Each path is monitored by sending request-response style SLA probes at a configured interval. Since the network may impose differentiated treatment for different forwarding classes, the SLA probes are sent per forwarding classes. The probe protocol is an extension of ITU Y1731/CFM. The metrics computed are delay, forward and reverse delay variation and loss (statistical and actual traffic loss in forward and reverse direction).

 

SLA monitoring may be configured for all 16 forwarding classes (network control through fc 15). However, from a practical perspective, the recommended configuration is to perform monitoring for the 4 commonly used forwarding classes (which map to the 4 common behavior aggregates on IP networks) - network control, expedited forwarding, assured forwarding and best-effort. 

 

The SLA monitoring module stores a rolling window of recent metrics (Last 5 minutes worth). The metrics are used as input to the path selection process, as well as periodically exported to Versa Analytics. Analytics APIs enable users to write sophisticated visibility as well as control applications.

 

Below is the sample configuration to enable sla-monitoring per forwarding class at each tenant level. Interval controls the frequency at which SLA probes are sent per path between the branches. Logging-interval controls the frequency at which SLA measurements per path are logged to the analytics. Loss threshold determines how many SLA probes have to be consecutively lost for the path to be declared down. The default value is 3 probes.

 

show orgs org Customer1 sd-wan
site {
    wan-interfaces {
        vni-0/0.0 {
            sla-monitoring {
                fc_ef {
                    interval         10;
                    loss-threshold   5;
                    logging-interval 300;
                }
                fc_af {
                    interval         10;
                    loss-threshold   5;
                    logging-interval 300;
                }
                fc_be {
                    interval         10;
                    loss-threshold   5;
                    logging-interval 300;
                }
            }
        }
    }
}



The figure illustrates the data flow between the SLA monitoring and path selection modules.


Loss Measurement

SLA measurement protocol measures 2 types of traffic loss:

  • Statistical Loss Measurement 

  • Inline Loss Measurement

Statistical Loss Measurement

This is based on the loss of the SLA PDUs on the SDWAN path during the measurement interval. SLA PDUs acts as synthetic data PDUs. They can be used to measure the loss even if there is no customer traffic. However the loss measurement is not granular and may not be able to detect small loss reliably and quickly as it requires the SLA PDUs itself to get dropped.

 

Inline Loss Measurement

This is a proprietary protocol used for measurement of customer’s traffic loss on the SDWAN path in forward and reverse direction.  It can detect even < 1% loss if there is traffic flow on the path.  SLA PDUs are used to carry the traffic usage information from the destination back to the source to compute the inline forward and reverse loss ratio. 

 

Combination of Statistical Loss and Inline Loss measurement can be used to accurately and reliably detect packet loss in the network.


Adaptive Monitoring

When SLA monitoring is configured on a WAN interface, we automatically start monitoring paths to every neighbor,link learnt thru MP-BGP. In case of full mesh topology with large number of branches this can result in lot of SLA traffic. Adaptive monitoring is a feature used to perform SLA monitoring only to neighbors which are actively passing traffic. This can help reduce the SLA monitoring processing/traffic on the network.

admin@Branch1-cli(config-wan-interfaces-vni-0/1.0)% show 
sla-monitoring {
    fc_nc {
        interval 1;
        adaptive-monitoring {
            inactivity-interval 300;
            suspend-interval    30;
            retries             3;
        }
    }
}


Adaptive monitoring configuration has 3 knobs:

  • Inactivity interval: How long the neighbor is inactive, before suspending sending of SLA monitoring PDUs.

  • Suspend interval: How long the path will be suspended after detecting inactivity

  • Retries: Number of retires after coming out of suspend state.

 

SLA PDUs will be initially sent every configured interval. If inactivity is detected between the neighbors, then it goes to suspend state during which no SLA PDUs are sent. While in suspend state, if any traffic activity to the neighbor is detected, it goes back to active state. If there is inactivity still detected after suspend interval, n (n=retires) count of SLA PDUs are sent and goes back to suspend state. Logging of the SLA metrics to analytics is not done during suspend interval.

 

Run these CLI command to see the SLA monitoring state and metrics:

  • admin@Branch1-cli> show orgs org Customer1 sd-wan sla-monitor path status 

  • admin@Branch1-cli> show orgs org Customer1 sd-wan sla-monitor path metrics


Versa SDWAN Policy

WAN path selection is governed by the currently active SDWAN policy. The policy is specific to a tenant, so each tenant on a multi tenant branch can control their own path selection behavior.

 

The following figure illustrates the various components of an SDWAN policy:



Rules

The Versa SDWAN policy consists of one or more rules. A rule is used to identify traffic for which we want to specify path selection behavior. Traffic matching a rule is subject to steering according to the forwarding profile associated with it. Traffic that does not match a specific rule is subject to default behavior.

 

Versa software is capable of matching traffic based on any combination of Layer 3 (IP address, zone, DSCP marking, etc), Layer 4 (L4 protocol, ports, etc), as well as Layer 7 criteria. It recognizes over 2600 applications and 83 url categories. Rule match criteria can also include arbitrary collections/groups of applications, as well as groups based on tags/attributes associated with each application. For example, FTP, SFTP and TFTP, (along with others) are tagged as "file transfer" applications. Versa supports various tags, and depending on which tags are used in the match condition, different groupings of applications can be created. 

 

In addition to the match condition, the rule specifies a Forwarding Profile. The forwarding profile determines how the matching traffic will be steered on the WAN.


Forwarding Profiles 

The key components of a forwarding profile are:

  • Circuit/Path Priorities
  • Connection Selection Method and Load Balancing
  • Symmetric-Forwarding
  • SLA Violation Action
  • Recompute Interval
  • Continuous Evaluation
  • SLA Profile

Circuit/Path Priorities

An SDWAN path between two branches is a combination of a specific local access circuit (wan interface) and remote access circuit. Users can configure up to 4 wan path priorities. Paths whose priority is not defined default to the lowest configurable priority. Paths are assigned run time priorities based on their configured priority, whether they are up/down and whether they meet the SLA requirement associated with the forwarding profile. At any any point in time, traffic is sent on the highest priority SLA compliant paths. 

 

If a path fails to meet the SLA requirement associated with the forwarding profile, it is demoted to a system defined “SLA violated” priority, just below the lowest configurable path priority. Path selection logic then successively tries to find other paths, which are at higher priority (and by definition, in compliance with the SLA).

Specifying Path Priorities

When specifying priorities, any combination of local and/or remote access circuit names or types may be provided. If only local access circuit names/types are specified, the remote name/type is wildcarded. Similarly, if only remote access circuit names/types are specified, local access circuit names/types are wildcarded.

Here are some examples of priority specification:

  • All paths of type “broadband” are preferred over paths of type “MPLS”, which are in turn preferred over paths of type “LTE”.

  • All paths originating at local access circuit “bband1” are preferred over paths originating at local access circuit called “mpls1”. 

  • All paths terminating at remote access circuit “bband4” are preferred over paths terminating at remote access circuit “bband1”.

  • The specific path from local access circuit “bband1” to remote access circuit “bband1” is preferred over any other paths

Avoiding Certain Paths

Users may explicitly configure that certain access circuits or paths are to be avoided. For example, it may be desirable to avoid sending non mission critical traffic on a high cost LTE circuit. 

The following is an example of path priority specification in a forwarding profile

 

show orgs org-services Customer1 sd-wan forwarding-profiles Fp-Video

sla-profile                 SLA-Class-Video;
circuit-priorities {
    priority 1 {
        circuit-names {
            local  [ ISP1 ISP2 ];
        }
    }
    priority 2 {
        circuit-names {
            local  [ MPLS ];
            remote [ MPLS ];
        }
    }
    avoid {
        circuit-names {
            local  [ LTE ];
            remote [ LTE ];
        }
    }
}


Connection Selection Method and Load Balancing

By default, Versa does flow based load balancing. All packets for a given flow are pinned to one path. Flows are load balanced among the highest priority SLA compliant paths. The “connection-selection-method” dictates the flow load balancing scheme. Currently, weighted round robin is used, which balances flows onto paths proportional to their available bandwidth. 

 

The “load-balancing” configuration can be set to “per-packet” instead of the default “per-flow”, in which case, packets for a flow will be load balanced on all eligible paths.

Symmetric-forwarding

This setting dictates how the device steers reverse direction traffic (i.e traffic returning from the destination branch to origin branch). By default, the return traffic is forwarded symmetrically, i.e on the same path on which the traffic was received. However, for applications where it is beneficial to independently choose the best path in either direction, symmetric forwarding may be turned off.

 

SLA violation action

If no paths meet the SLA requirements associated with the forwarding profile towards a specific branch, a user configured action is taken for traffic destined to that branch: 

  1. Continue to forward traffic on the SLA violated paths (default behavior).

  2. Drop traffic. Affected traffic will be dropped, and forwarding will resume only when there is at least one SLA compliant path

Recompute interval

Each Versa branch device evaluates the state of its configured forwarding profiles periodically, to determine which paths are in compliance. The evaluation period is controlled by the recompute timer. The state of each forwarding profile is evaluated towards each remote branch, for each path towards that branch, for each forwarding class for which SLA monitoring is configured. 

 

The evaluation is performed by summarizing the SLA metrics (delay, jitter, loss) recorded by the SLA monitoring module. Paths are assigned run time priorities based on whether the metrics are within the configured SLA. Paths that meet the SLA are assigned their configured priority. Paths that do not meet the SLA are assigned a system internal “SLA Violated” priority. 

Continuous evaluation

Path selection for a flow is normally done only for two reasons - first, when the flow is created, and second, if the path on which the flow is determined to be down. That is, even if a path goes out of SLA compliance, existing flows that were pinned to it continue to use it. Only new flows use other (compliant) paths. This is the default and recommended configuration. 

 

However, for real time applications such as voice and video, it may be desirable to always switch to a better path when it becomes available. For such applications, continuous evaluation can be turned on in the corresponding forwarding profile. Existing flows will then switch to better paths throughout their lifetime, but at most once every recompute interval.

 

Thus, it should be apparent that the recompute interval and continuous evaluation settings define how reactive the system is to changes in network SLA characteristics.

SLA Profile

The SLA profile defines application/network thresholds which a path must meet in order for it to be SLA compliant. Currently, the following metrics are supported for defining an SLA:

  • Round trip delay

  • Forwarding delay variation

  • Reverse delay variation

  • Packet loss percentage

  • Circuit transmit utilization

  • Circuit receive utilization

 

Multiple parameters may be specified in the SLA profile. The SLA is deemed violated on a path if any one of the summary metrics is above the threshold specified in the SLA profile.

 

Instead of, or in addition to specifying thresholds for one or more metrics, the user may also specify that the path with the best metric be used. The following are supported:

  • Lowest latency

  • Lowest delay variation

  • Lowest packet loss

show orgs org-services Customer1 sd-wan sla-profiles 

SLA-Class-Video {

    latency                      120;

    loss-percentage              10;

    delay-variation              120;

    circuit-transmit-utilization 60;

    circuit-receive-utilization  60;

    low-latency;

    low-packet-loss;

   low-delay-variation;

}



Applying SDWAN policy

 When traffic enters the Versa FlexVNF, it is assigned a forwarding class in order to prioritize traffic forwarding within the FlexVNF. The forwarding class is determined by qos or application-qos rules defined under orgs org-services class-of-service. For example, there may be a rule to put all SIP/RTP/RTCP traffic into expedited forwarding class (fc_ef), and all business critical TCP traffic into assured forwarding class (fc_af). If no rules are configured, the traffic is assigned to the best-effort forwarding class.

 

In order to perform SLA based path selection, SD wan policy rules refer to the SLA compliance state of that forwarding class to which the traffic is assigned. Continuing the above example, assume that an SDWAN policy rule has been created to match RTP traffic, and associated with a forwarding profile called “fp-voice”. Since the RTP traffic is assigned to the fc_ef forwarding class, the Versa FlexVNF consults the SLA path state of forwarding profile fp-voice for forwarding class fc_ef to choose an SLA compliant path.


Path failure detection

The SLA monitoring module deems a path to be down if a configured number of SLA probes are lost consecutively. When this happens, a notification is sent to the path selection module to immediately take the path out of rotation and send traffic on other eligible paths. 

 

The path is deemed to be up again as soon as SLA probes successfully start completing the round trip to the remote branch and back.

Path Dampening

If a path constantly flaps between up and down state, it is possible to keep it out of rotation for some time, to avoid constant migration of traffic from one path to another. This is achieved via the link dampening feature, where a path that experiences too many flaps in a given interval is suppressed for a hold interval. The flap count, flap interval and hold interval are all configurable. 

 

Path dampening is configured at the system level. The following is example configuration:

show system sd-wan sla-monitor

flap-threshold        10;

dampen-eval-interval  120;

dampen-clear-interval 300;