If we see the below issue where solr encounters a time-out because it’s unable to contact zookeeper within 30 seconds (in solr.log), the solution would be to add entries for all the zookeeper servers (basically you can add the listener addresses of all the cluster nodes) under the /etc/hosts listing on all the “Search” nodes

 

 

 

You can simply add the “listener” address of all the nodes in your cluster in /etc/hosts listing on all the Search nodes – this should suffice to handle the above condition

 

For ex, below in /etc/hosts file we’ve added entries for the other cluster nodes which are acting as zookeepers

 

 

 

If you want a list of the zookeeper servers (the listener addresses of the cluster nodes acting as zookeepers), you can execute the below on the search nodes

 

cat /opt/versa/scripts/van-scripts/vansetup.conf | grep -i zookeeper

 

Once you’ve appended the entries in the /etc/hosts file on all the Search nodes, please execute the below

 

  1. Stop the solr services on all the Search nodes

              sudo service monit stop

              sudo service solr stop



  1. Restart Zookeeper on all the Search nodes

 

              sudo service zookeeper restart

 

  1. Start the solr services on all the Search nodes

 

sudo service solr start

sudo service monit start

sudo service versa-monit start


 

 

Once done, please check “vsh dbstatus” on all the search nodes to confirm “live-nodes” shows the right number of search nodes and “collections” is reflects a “1”

 

If you continue see solr failure in the output of “vsh dbstatus” it would need further troubleshooting

 

 

Side note: The above solution should be applied even if you encounter an issue with Solr being stuck in “recovering” state for a long time, you might see the below logs in solr.log