Table of Contents
Purpose and Prerequisite.
Issue: SLA down between branch to branch.
1. Check SLA status on both branches.
2. Check the ICMP reachability between both branches.
3. Verify SLA Packets reachability over UDP 4790.
4. Check if SLA packets drop seen in one direction.
5. Check ipsec tunnel information.
7. Command outputs to collect.
8. SLA NAT Combinations between Local and Remote VOS-Appliances.
Purpose
The purpose of this document is to help troubleshoot the SLA down Issue between Branches. SLAs are sent to track reachability over each underlay between Branches. By default, SLA TOS marking is “ef” and sent by both Branches over UDP 4790.
Prerequisite
The SLA, BGP, and IPsec tunnel should be up and stable with the Controller. If not, then please refer to the KB article for troubleshooting the SLA and tunnel down with the Controller.
Issue: SLA down between branch to branch
1. Check SLA status on both the branches
Here we can see SLA is down for INTERNET transport and also the PDU loss seen is 100%
#Branch-1
The below command can be used to check the sla status between the branches:
show orgs org <org-name> sd-wan sla-monitor status <remote-branch-name>
Below command can be used to know the Public IP and Public Port of the remote branch which will help to know the remote branch underlay connectivity details
show orgs org <Org-Name> sd-wan detail <remote-branch-name>
If there is NAT happening then NAT status would be true and the tcpdump (captures) mentioned in Point 3 and 4, should be collected on NATed IP/Port.
#Branch-2
2. Check the ICMP reachability between both the branches
Use ping and traceroute to check reachability over the WAN link where SLA is down. If ICMP is failing, there may be an issue with the underlay/WAN Link and need to be checked.
If ping/traceroute is blocked or working fine move to step 4
#Branch-1
Below command can be used to check ping and traceroute:
ping <remote-public-ip> routing-instance <transport-vr> source <transport-IP>
traceroute <remote-public-ip> routing-instance <transport-vr>
#Branch-2
3. Verify SLA packets reachability over UDP 4790 or NATed Datapath Port
Even if ICMP is working there is a possibility UDP 4790 is blocked or dropped. Tcpdump on both the branches will help confirm the drop and if it is in a specific direction.
#Branch-1.
Run tcpdump on egress vni-0/x interface (transport) of Branch-2 Public IP and Public Port. Here we see traffic going out to Branch-2 but no return traffic. UDP 4790 dropped in both directions.
Below command can be used to run tcpdump and apply filter
tcpdump vni-x/x filter "host <remote-public-IP> and port <public-port>"
#Branch-2.
Tcpdump for Branch-1 WAN IP. Here we see no incoming packet from a branch, all outgoing packets.
Based on the packet captures we can conclude traffic for 4790 is blocked in both directions on underlay/WAN.
4. Check if SLA packets drop seen in one direction
Here is a case where SLA sent from Branch-1 are received by Branch-2 but SLA from Branch-2 are not seen by Branch-1 (It can be Vice Versa as well). Ts is inraffic (UDP 4790) los only one direction, but the issue is still related to the underlay/WAN link. UDP 4970 dropped in one direction.
#Branch-1
#Branch-2
5. Check ipsec tunnel information
Provide Dynamic tunnel output:
The below command will be used to check the dynamic tunnel status:
show interfaces dynamic-tunnels
You can also provide a few outputs like below and let us know if any drop counters increase after multiple iterations:
Below command can be used to get the branch-2-branch security association brief and detail counters outputs and information:
show orgs org-services <org-name> ipsec vpn-profile <profile-name> branch-2-branch security associations brief
show orgs org-services <org-name> ipsec vpn-profile <profile-name> branch-2-branch security associations detail
Note: If PDU loss is shown as 100% in SLA status output then once SLA goes down, it will bring down the dynamic tunnel as well.
6. Log files
If the issue is observed in ipsec also provide the below log files and configuration from both the branches.
/var/log/versa/versa-ipsec.log
/var/log/versa/versa-ipsec-ctrl.log
/var/log/versa/versa-infmgr.log
7. Command outputs
show orgs org <Org-name> sd-wan sla-monitor status <remote-branch-name>
show orgs org <Org-name> sd-wan detail <remote-branch-name>
show arp routing-instance <Transport-VR>
ping <next-hop-ip> routing-instance <Transport-vr> source <Transport-IP>
ping <remote-public-ip> routing-instance <Transport-vr> source <Transport-IP>
traceroute <remote-public-ip> routing-instance <Transport-VR>
tcpdump vni-x/x filter "host <remote-public-IP> and port <public-port>"
show interfaces dynamic-tunnels
show orgs org-services <Org-name> ipsec vpn-profile <profile-name> branch-2-branch security associations brief
show orgs org-services <Org-name> ipsec vpn-profile <profile-name> branch-2-branch security associations detail
Login to vsmd
>> vsh connect vsmd
>> show vsf tunnel nat-info
Note:
In a condition, if the Primary Controller goes down and after that, if Branch reboots due to any reasons, we see that once this branch is UP, SLA to this branch from other SD-WAN branches remains down. This is a known limitation and is fixed from 16.1R2S10 and 20.2.1 versions.
8. SLA NAT Combinations between Local and Remote VOS-Appliances:
*Hubs if behind a NAT-Device/Firewall and has a Private-IP It is always recommended to have a One-to-one NAT mapping
Possible Combinations:
Legend:
1) EI: EndPoint Independent NAT
2) ED: EndPoint Dependant NAT
3) Pub-IP: Public IP
-> SLA on two Spokes behind ED NAT will not come up, please configure them as Spoke-to-Spoke via Hub which is either having a Public-IP or 1-to-1 NAT Configured.
-> SLA on two Spokes behind EI NAT would work fine.
Scenario-1:
A round-robin sending SLA Packets to the Public IP's is done only if both the Branches are behind NAT (to handle the scenario of multiple NAT hierarchy). If we do a tcpdump on the Sender Spoke, we should see the SLA packets being sent to both the Remote-Spokes Private Interface IP and the NAT-IP that it learned from the VBP alternatively until the SLA is established.
Scenario-2:
If a Spoke is having a Public-IP and if the Hub is behind the Firewall with a Private-IP with one-to-one NAT translation,
-> Spokes Behind Public-IP's would work fine
vsh connect vsmd:
vsm-vcsn0> show vsf tunnel branch-table
Get the corresponding "ET PTVI" and the "Tnt ID" to the remote Branch we are troubleshooting the SLA not coming up:
vsm-vcsn0> show vsf tunnel nat-info ptvi <ET PTVI> <Tenant-ID> detail
vsm-vcsn0> show vsf tunnel nat-info ptvi <ET PTVI> <Tenant-ID> stats
9. Contact Support
Please reach out to Versa-Support <support@versa-networks.com>, with the above outputs.