Purpose
We've written a python script to simplify the process of log-collection from an Analytics cluster (including a setup with log-forwarders) to ensure minimal effort on your part and better our efficacy in terms of availing all the fundamental/pertinent information about your system to start troubleshooting the issue.
You would just need to feed the management addresses of your cluster nodes in the cluster.conf file post which the script would log into these nodes and fetch pertinent logs while also providing an output that specifies anomalies/discrepancies existing in your system.
The various outputs, collected by the script, are updated to a text file called analytics.debug (in the same directory where the script is present) which you can then upload/attach to the Versa TAC ticket/case along with the logs generated by the script on the terminal.
Action Required
Please download the attached zip file and unzip its contents (it will have two python scripts, acdc.py and log.py, along with a cluster.conf file which has 2 sample ip-addresses already present in it). You would just need to copy these files to your master director, or any server, which has access to the northbound/management ip-addresses of your analytic cluster setup and log-forwarders (if applicable)
The script uses python3
<update 03/03/2021>
You can also download the .tar bundle attached to this KB and copy it over to the director. You can untar this bundle on the Director as shown below
Set the permission
You can then edit the cluster.conf file and proceed with script execution (further details below)
Details
Post updating cluster.conf, with the address:username:password information of your cluster nodes, you would just need to execute the script from their director/server which has northbound reachability to the cluster nodes.
As seen below, you would need to feed information about the cluster nodes in cluster.conf (management addresses of the cluster nodes and log-forwarders) along with username:password
vi cluster.conf
Post updating the cluster.conf file, you would just need to execute the acdc.py script as below
The script is designed to fetch all the pertinent outputs required for checking the overall services, application and database status – the list of outputs gathered by this script is as below
Post the execution of the script, all the outputs, fetched from the cluster nodes, are updated to a file called analytics.debug (it will create a fresh copy of analytics.debug everytime the script is executed)
The script would also print the obvious discrepancies/anomalies that it has detected on the system under “Preliminary observation” – chiefly anomalies around the service status, cpu/memory status, dse status and database access failure (cqlsh), along with suggestions to “recover” from the failure
<sample one>
You can then upload the analytics.debug file to the case/ticket for our reference, along with the above terminal logs generated by the script