Question: How long does an SD-WAN Endpoint continue to work when the SD-WAN Controller goes down?


Solution:

The steps in this example help in understanding how the SD-WAN provides the endpoint availability when a Branch-CPE (S1) loses connection with an SD-WAN Controller:

  • The Branch-CPE1 (S1) continues to forward the traffic based on the last route-table that it receives from the SD-WAN controllers before losing connection to all the SDWAN Controllers (up to four of them). 
  • The Branch-CPE1 (S1) continues to route using its internal-routers within the branch.
  • The Branch-CPE (S1) stops receiving route updates, security association and other information, which is communicated using MP-BGP, after losing connection with all SD-WAN Controllers. 
  • Thus, when the Remote Branch-CPE2 (S2) changes its security association, the Branch-CPE1 (S1) that got disconnected from all the SDWAN Controllers cannot communicate with Remote Branch-CPE2 (S2).
    NOTE: By default, each branch (S2) rotates its security-association (keys) every seven hours.
  • After eight hours, the routes which were learned from the SD-WAN Controller would expire. 
  • At that time, the Branch-CPE (S1) would route based on underlay-routing (MPLS or broadband), if it is configured to do so.

Versa uses IETF’s Graceful Restart (GRES) procedures to handle the failure of SD-WAN Controllers. Versa has extended this time to over 24 hours by adding a multiplier to the GRES which helps the Versa FlexVNF-Branch to retain routes which it has learned from the SD-WAN Controller for more than 24 hours.


Follow these steps to configure SD-WAN endpoint availability (based on the assumptions explained in the narration above): 

  1. Select Appliance Context > Configurations > Services > IPSec > Branch SDWAN Profile in the dashboard. Click on + to open the Add Branch SDWAN Profile window and create a branch SDWAN profile. Change the default values to increase the ipsec re-key and life-timer values.
    NOTE: Configure ipsec re-key and life-timer values on all the branches for a full mesh deployment.
  2. Select Appliance Context > Configurations > Services > IPSec > VPN Profile and select a controller to which you have added the SD-WAN profile. Click on the controller to open the Edit IPSec VPN window.
  3. Check for the newly added Branch-2-Branch SD-WAN profile in the Branch SD-WAN Profile drop down box in the Edit IPSec VPN window
  4. Run the show orgs org-services Tenant1 ipsec vpn-profile SDWAN-Controller-Profile branch-2-branch security-associations brief CLI command to verify the updated timer values in the newly added SD-WAN profile.
    admin@Branch1-cli> show orgs org-services Tenant1 ipsec vpn-profile SDWAN-Controller-Profile branch-2-branch security-associations brief
    Remote Gateway   Transform  Inbound SPI  Bytes/sec  Outbound SPI  Bytes/sec  Up Time  Next Rekey Time
    ---------------  ---------  -----------  ---------  ------------  ---------  -------  ---------------
    10.1.0.102       aes-gcm    0x50120066   0          0x50520065    0          26826 sec    577149 sec
    [ok][2017-08-18 07:29:58]
    admin@Branch1-cli>


  5. (OPTIONAL - only when the IPSec timer is not updated as per the configured value in step 4)
    Run the vsh allow-cli command to enable modification (disable and enable the WAN link) using CLI.
    admin@branch1-cli> exit
    admin@branch1:/var/log$ vsh allow-cli
    Enter password: CMD_MAAPI is true [mtid = 0]
    CMD_MAAPI is true [mtid = 18610]
    CMD_MAAPI is true [mtid = 18610]
    CMD_MAAPI is true [mtid = 0]


    • Run the set interfaces vni-0/0 enable false CLI command to disable the WAN link to flap the BGP neighbor-ship towards the controller.
      dmin@branch1-cli(config)% set interfaces vni-0/0 enable false
      [ok][2017-08-22 01:47:07]


    • Run the commit CLI command to save the changes.
      admin@branch1-cli(config)% commit
      No modifications to commit.
      [ok][2017-08-22 01:47:09]


    • Run the run show interfaces brief CLI command to view the status of the interface.
      admin@branch1-cli(config)% run show interfaces brief
      vni-0/0.0  52:54:00:31:f9:71  down  down   2       mpls1-Transport-VR  172.16.1.67/27 =====>
      vni-0/1    52:54:00:ab:e9:79  up    up     -       -


    • Run the set interfaces vni-0/0 enable true CLI command to enable the WAN link to flap the BGP neighbor-ship towards the controller to receive the new SA timers if the timer value does not change as configured.
      admin@branch1-cli(config)% set interfaces vni-0/0 enable true
      [ok][2017-08-22 01:47:20]


    • Run the commit CLI command to save the changes.
      admin@branch1-cli(config)% commit
      Commit complete.
      [ok][2017-08-22 01:47:23]


    • Run the run show interfaces brief CLI command to view the status of the interface.
      admin@branch1-cli(config)% run show interfaces brief
      NAME       MAC                OPER  ADMIN  TENANT  VRF                 IP
      -----------------------------------------------------------------------------------------
      vni-0/0.0  52:54:00:31:f9:71  up    up     2       mpls1-Transport-VR  172.16.1.67/27
      
      [ok][2017-08-22 01:47:27]


  6. Increase the GRES timer values to maximum and multiplier values according to the needs. The maximum possible value configurable is 255.
    File:Notes edit.svg NOTE: Multiplier support is available from 1Release 16.1 S2 onwards.

    • Run the show configuration routing-instances Tenant1-Control-VR protocols bgp graceful-restart | display set CLI command to view a list of set commands required to configure graceful timer.
      admin@Branch1-cli> show configuration routing-instances Tenant1-Control-VR protocols bgp graceful-restart | display set
      set routing-instances Tenant1-Control-VR protocols bgp 2 graceful-restart enable
      set routing-instances Tenant1-Control-VR protocols bgp 2 graceful-restart maximum-restart-time 3600
      set routing-instances Tenant1-Control-VR protocols bgp 2 graceful-restart recovery-time 3600
      set routing-instances Tenant1-Control-VR protocols bgp 2 graceful-restart select-defer-time 3600
      set routing-instances Tenant1-Control-VR protocols bgp 2 graceful-restart stalepath-time 3600
      set routing-instances Tenant1-Control-VR protocols bgp 2 graceful-restart multiplier 72


  7. Run the show configuration routing-instances Tenant1-Control-VR protocols bgp graceful-restart CLI command to view the final configuration.
    admin@Branch1-cli> show configuration routing-instances Tenant1-Control-VR protocols bgp graceful-restart
    2 {
        graceful-restart {
            enable;
            maximum-restart-time 3600;
            recovery-time        3600;
            select-defer-time    3600;
            stalepath-time       3600;
            multiplier           72;  ====================> It configures the availability to 72 Hours(72x 1 hour)
        }
    }