This documentation addresses the aspect of setting up the Directors as remote-collectors on the “log-forwarders”, or the nodes in the analytics cluster which are also acting as log-collectors. Log-forwarders are nothing but nodes that act purely as log-collectors without the presence of a DB (cassandra/solr) locally on them, they in-turn send the logs to the analytics cluster nodes by establishing a connection to the cassandra/solr.
The reason for adding the Versa Directors as remote-collectors on the log-collectors is so that the alarm logs can be streamed to the Directors to be available on their “Monitor” view
The configuration to set up the Directors as remote-collectors is available below
If you notice the highlighted section above, both the directors are added under the same “collector-group” in the above configuration – a “collector-group” works in a “failover” mode, so essentially it will forward logs to just one Director at a time, essentially starting with the first director in the listing
You can check the below output under admin/status/log-collector-status of your Analytics UI
You can check your directors listed in the above UI, and refresh the page to see the stats incrementing towards one director. It will send the logs to one of the directors and in the event of a “connection flap” towards this director, it will switch to the other director (which is what we mean by it operating in a “failover mode”).
The collector-group works in failover mode and there is no auto-revert unless you set a “primary collector” statically.
Challenges with the above configuration
The problem with the above configuration is that there is no way for analytics to know which director is the “Master”, so in the event of a Director mastership failover (let’s say triggered due to a service/task failure on the Master VD) it would continue to send logs to the director which has now transitioned to “Slave” mode.
The Slave VD will not process the logs received from the Analytics/log-collectors (this is expected behavior). The Master VD, since it’s no longer receiving logs from the log-collectors, will stop populating alarms on its “Monitor view”
The bottom-line is that the log-collector can never know about the “mastership” status of the Director, and hence the above configuration can always lead to a possibility where the log-collector ends up sending logs/alarms to the “Standby VD” (wherein the logs would be discarded).
Solution
The solution is to create two collector-groups, one for each director and then call both the collector-groups under a “collector-group-list”, and use this collector-group-list in the “remote-collector-profile” under the exporter rules.
Since there are two collector-groups now, one for each director, called under a “collector-group-list” – it would essentially mean that the logs/alarms would be streamed to both the director all the time.
The standby Director will always discard it and the master Director will process it. This configuration ensures that the master VD will always receive logs at all time even in the event of a mastership failover
Configuration example is as below
Now check the statistics to confirm if logs are being sent to both the directors
Some commonly asked questions
Question 1: If we set up two collector groups, will a director accept logs from analytics cluster when in standby?
[Reply] When the director is in standby mode it will "not" process the logs received from Analytics (though it will continue to listen on the collector port 20514). The idea behind setting separate collector groups is to ensure alarm logs are streamed to both directors at all times - the standby director would always discard it, but the current master will receive/process it ( so you will not be in this situation where master VD does not receive alarms owing to failover in the collector group)
Question 2: If the standby Director discards logs from analytics, will the standby receive these logs via the Director DB sync? I’m concerned that if we have a Director failover from Primary to Secondary – the secondary Director will not have any logs and we are back to the original issue.
[Reply] Yes, the Master VD always sync its "alarms" with the Standby VD (this is continual sync that occurs as part of the HA task). So during the fail-over you will find all the alarms on the standby VD.
Question 3: With two collector groups, what’s expected in the event of a split-brain or dual primary Directors? Will the Director database be negatively impacted?
[Reply] No, during the split brain condition both directors would be seen to be sending email notifications (this should ideally not be an issue - just duplicate mails). Post resolving the split-brain the Master VD will continue to sync its alarms with the Standby VD
Question 4: For internal monitoring purposes, aside from checking ‘flaps’ how can we tell if Analytics are sending logs to the correct or incorrect Director? Maybe if there are any changes with the Analytics ‘ACTIVE COLLECTOR’ configuration?
[Reply] Just check the remote-collector statistics, you can check the increments on msg-sent (just keep refreshing) would let you know which director the logs are streamed to
Question 5: I see that we can set the ‘PRIMARY COLLECTOR’, is this recommended and would it help in case of a fail-over event?
[Reply] No, you should ideally not be setting primary-collector as we never know when Director would take up master-ship, the ideal configuration is to use two collector-groups one for each director