If for any reason an existing node in the cluster encounters a failure that requires a replacement of the bare-metal or re-spin (of a new VM), one must first ensure that the new node is brought up with the same software version as the existing nodes, with the same number of interfaces and similar cpu/memory/hyper-threading profile


Post that you can follow the below steps to re-add the node to the cluster  



Step1:  Copy /opt/versa/scripts/van-scripts/vansetup.conf from any of the existing search/analytic node (depending on the personality being replaced) onto the new node

 

Step2: Make sure all the interface addresses on the new node are the same as the old node (setup /etc/network/interfaces correctly)

 

Step3: Copy the contents of /etc/hosts from any of the existing node to the new node, also make an entry for the local listerner address to hostname mapping if it's missing

 

Step4: Modify vansetup.conf of the next node to reflect the correct rpc address and listener address, and in the case of "analytic" personality, set the seeds as one of the existing analytic nodes (seeds="a.b.c.d" were a.b.c.d is the listener address of an existing analytic node)

 

Step5: In the case of an analytic node replacement, you would first need to remove the host-id from the node being re-added from the cassandra cluster, you can do that by executing the below

 

- check the nodetool status on any of the existing analytic node

- remove the host-id of the node which is being replaced as below

    nodetool removenode <host-id of node being replaced>

 

Step6: You can now execute vansetup.py on the new node

 

cd /opt/versa/scripts/van-scripts

sudo ./vansetup.py

 

Step7: You will also need to sync the certs from the director to this node (and vice-versa)

 

sudo su versa

 

/opt/versa/vnms/scripts/vnms-cert-sync.sh --sync

 

/opt/versa/vnms/scripts/vd-van-cert-upgrade.sh --pull  (select "y" when prompted for "postpone restart")

 

Restart the directors in the sequence below

 

- Execute "vsh stop" on the Secondary Director

- Execute "vsh restart" on the Primary Director

- Execute "vsh start" on the Secondary Director 

 

Confirm if HA is in sync between the directors "request vnmsha actions get-vnmsha-details fetch-peer-vnmsha-details true"

 

Post checks:

 

In case you are re-adding an analytic personality, perform the below check

 

Check "nodetool status" on the new node (and any of the existing analytic node), the new node would likely be in UJ status - it will eventually transition to UN status only it has completed syncing the data

 

In case you are re-adding a search personality, perform the below check

 

Check "vsh dbstatus" on the new node and confirm if live-nodes reflects the number of search nodes and collections is set a "1"

 

Also, check the below

 

cd /opt/versa/scripts/van-install

sudo su

./cluster-install.sh solr cluster_status

 

Confirm if all replicas show up as "active"

 

Try accessing this node from Director UI to confirm reachability

 

If you face any issues, please log a ticket with Versa TAC