If we see the below issue where solr encounters a time-out because it’s unable to contact zookeeper within 30 seconds (in solr.log), the solution would be to add entries for all the zookeeper servers (basically you can add the listener addresses of all the cluster nodes) under the /etc/hosts listing on all the “Search” nodes
You can simply add the “listener” address of all the nodes in your cluster in /etc/hosts listing on all the Search nodes – this should suffice to handle the above condition
For ex, below in /etc/hosts file we’ve added entries for the other cluster nodes which are acting as zookeepers
If you want a list of the zookeeper servers (the listener addresses of the cluster nodes acting as zookeepers), you can execute the below on the search nodes
cat /opt/versa/scripts/van-scripts/vansetup.conf | grep -i zookeeper
Once you’ve appended the entries in the /etc/hosts file on all the Search nodes, please execute the below
- Stop the solr services on all the Search nodes
sudo service monit stop
sudo service solr stop
- Restart Zookeeper on all the Search nodes
sudo service zookeeper restart
- Start the solr services on all the Search nodes
sudo service solr start
sudo service monit start
sudo service versa-monit start
Once done, please check “vsh dbstatus” on all the search nodes to confirm “live-nodes” shows the right number of search nodes and “collections” is reflects a “1”
If you continue see solr failure in the output of “vsh dbstatus” it would need further troubleshooting
Side note: The above solution should be applied even if you encounter an issue with Solr being stuck in “recovering” state for a long time, you might see the below logs in solr.log