This article provides step-by-step instructions to replace and re-add a failed node (Analytics or Search) in a Versa cluster.


Prerequisites

Before starting, ensure the following for the replacement node:

  • Same Versa software version as other nodes in the cluster.

  • Same number of network interfaces.

  • Similar CPU, memory, and hyper-threading profile.

  • IP addresses and routes are identical to the failed node.

  • Hostname and DNS entries are correctly set in /etc/hosts.

Step 1: Prepare the Node

  1. Make sure the following files are the same from an existing node of the same personality (Analytics or Search):

  2. Make sure the below on vansetup.conf on the new node:

    • Update the rpc address and listener address fields.

    • zookeeper parameter should be set correctly.

    • Replication factor set as per your cluster configuration.

    • For Analytics nodes only, set the seeds parameter to the listener IP of an existing working Analytics node in the cluster, this ensures that the new replacement node communicates with the existing cluster, syncs data, and joins the Cassandra cluster. 

      seeds="a.b.c.d"
      

Step 2: Perform Pre-Setup Based on Node Type

  • Analytics Node:

    • Run the following on any existing Analytics node to remove the old node's host ID:

      # nodetool status  # Note the host-id of the failed node 
      
      # nodetool removenode <host-id>
      
  • Search Node:

    • Ensure the Zookeeper cluster is up and has a leader before running the setup. Run below command on node running zookeeper.

      # vsh dbstatus

Step 3: Run vansetup.py

Execute the setup script on the new node:

# cd /opt/versa/scripts/van-scripts 
# sudo ./vansetup.py

Step 4: Sync Certificates from the Director node

# sudo su versa 
# cd /opt/versa/vnms/scripts/
# ./vnms-cert-sync.sh --sync 
# ./vd-van-cert-upgrade.sh --pull

When prompted for "postpone restart", select y.


Step 5: Restart Directors (HA Setup Only)

  1. On Secondary Director: vsh stop

  2. On Primary Director: vsh restart

  3. On Secondary Director: vsh start

Verify HA sync between directors


Post-Checks

For Analytics Node

  • Run nodetool status on both the new and existing Analytics nodes.

  • New node will initially show UJ (Joining). It should change to UN (Up/Normal) once data sync completes.

  • Once the newly added node is in UN state, run below command on the new node to ensure data consistencies.

    # nodetool repair
    
  •  Once the newly added node is in UN state, run below command on the existing nodes to removes data that no longer belongs to a node after a topology change.

    # nodetool cleanup

For Search Node

  1. Check search DB status:

    # vsh dbstatus
    
    • Ensure live-nodes count is correct.

    • collections should be healthy

  2. Check Solr cluster health:

    # sudo /opt/versa/scripts/van-install/cluster_install.sh solr cluster_status

            All replicas should show as active, and all data is in sync across the replica.

Confirm the new node is reachable via the Director UI.


Troubleshooting

If the node fails to join properly or services remain inactive:

  • Review nodetool status, vsh dbstatus, and Solr status.


If, on the newly added search node, the "Total Documents" field appears blank for any replica of a collection — even though its state is shown as Active — this indicates that the documents have not yet synchronized.

To verify this, navigate in the Analytics GUI to Admin → System Status → Status, and review the "Status per host/replica" table (see reference image below).

If the issue is observed, re-add the affected replica using the commands provided below to initiate document synchronization.

 


  • To remove a replica, run the command below from the CLI of any search node. Please ensure that you are checking the correct hostname (IP of the newly added node), shard_name (highlighted under the Replica column in the image above), coll_name (highlighted under the Collection column in the image above), and replica_name (highlighted under the Core Name column in the image above).  
# sudo /opt/versa/scripts/van-install/cluster_install.sh solr delete_replica <coll_name> <shard_name> <replica_name>

Example:
# sudo /opt/versa/scripts/van-install/cluster_install.sh solr delete_replica alarmlogs shard1 core_node8
  • To re-add the replica, run the command below from the CLI of any search node.
# sudo /opt/versa/scripts/van-install/cluster_install.sh solr add_replica <coll_name> <shard_name>

Example:
# sudo /opt/versa/scripts/van-install/cluster_install.sh solr add_replica alarmlogs shard1

If issues persist, open a ticket with Versa TAC and include:

  • Node type and personality

  • Output of post-check commands

  • Capture the tech-support and shell session logs