If for any reason an existing node in the cluster encounters a failure that requires a replacement of the bare-metal or re-spin (of a new VM), one must first ensure that the new node is brought up with the same software version as the existing nodes, with the same number of interfaces and similar cpu/memory/hyper-threading profile
Post that you can follow the below steps to re-add the node to the cluster
Step1: Copy /opt/versa/scripts/van-scripts/vansetup.conf from any of the existing search/analytic node (depending on the personality being replaced) onto the new node
Step2: Make sure all the interface addresses on the new node are the same as the old node (setup /etc/network/interfaces correctly)
Step3: Copy the contents of /etc/hosts from any of the existing node to the new node, also make an entry for the local listerner address to hostname mapping if it's missing
Step4: Modify vansetup.conf of the next node to reflect the correct rpc address and listener address, and in the case of "analytic" personality, set the seeds as one of the existing analytic nodes (seeds="a.b.c.d" were a.b.c.d is the listener address of an existing analytic node)
Step5: In the case of an analytic node replacement, you would first need to remove the host-id from the node being re-added from the cassandra cluster, you can do that by executing the below
- check the nodetool status on any of the existing analytic node
- remove the host-id of the node which is being replaced as below
nodetool removenode <host-id of node being replaced>
Step6: You can now execute vansetup.py on the new node
cd /opt/versa/scripts/van-scripts
sudo ./vansetup.py
Step7: You will also need to sync the certs from the director to this node (and vice-versa)
sudo su versa
/opt/versa/vnms/scripts/vnms-cert-sync.sh --sync
/opt/versa/vnms/scripts/vd-van-cert-upgrade.sh --pull (select "y" when prompted for "postpone restart")
Restart the directors in the sequence below
- Execute "vsh stop" on the Secondary Director
- Execute "vsh restart" on the Primary Director
- Execute "vsh start" on the Secondary Director
Confirm if HA is in sync between the directors "request vnmsha actions get-vnmsha-details fetch-peer-vnmsha-details true"
Post checks:
In case you are re-adding an analytic personality, perform the below check
Check "nodetool status" on the new node (and any of the existing analytic node), the new node would likely be in UJ status - it will eventually transition to UN status only it has completed syncing the data
In case you are re-adding a search personality, perform the below check
Check "vsh dbstatus" on the new node and confirm if live-nodes reflects the number of search nodes and collections is set a "1"
Also, check the below
cd /opt/versa/scripts/van-install
sudo su
./cluster-install.sh solr cluster_status
Confirm if all replicas show up as "active"
Try accessing this node from Director UI to confirm reachability
If you face any issues, please log a ticket with Versa TAC