Solr failure or stuck in recovering due to zookeeper connection timeout : Versa Support

If we see the below issue where solr encounters a time-out because it’s unable to contact zookeeper within 30 seconds (in solr.log), the solution would be to add entries for all the zookeeper servers (basically you can add the listener addresses of all the cluster nodes) under the /etc/hosts listing on all the “Search” nodes

You can simply add the “listener” address of all the nodes in your cluster in /etc/hosts listing on all the Search nodes – this should suffice to handle the above condition

For ex, below in /etc/hosts file we’ve added entries for the other cluster nodes which are acting as zookeepers

If you want a list of the zookeeper servers (the listener addresses of the cluster nodes acting as zookeepers), you can execute the below on the search nodes

cat /opt/versa/scripts/van-scripts/vansetup.conf | grep -i zookeeper

Once you’ve appended the entries in the /etc/hosts file on all the Search nodes, please execute the below

Stop the solr services on all the Search nodes

sudo service monit stop

sudo service solr stop

Restart Zookeeper on all the Search nodes

sudo service zookeeper restart

Start the solr services on all the Search nodes

sudo service solr start

sudo service monit start

sudo service versa-monit start

Once done, please check “vsh dbstatus” on all the search nodes to confirm “live-nodes” shows the right number of search nodes and “collections” is reflects a “1”

If you continue see solr failure in the output of “vsh dbstatus” it would need further troubleshooting

Side note: The above solution should be applied even if you encounter an issue with Solr being stuck in “recovering” state for a long time, you might see the below logs in solr.log

Solr failure or stuck in recovering due to zookeeper connection timeout

More articles in Onboarding