Problem Description:

Commits from UI are in the order of 20 seconds to 30 seconds


Issue seen:

NCS level commit also was taking 10 seconds same as device commit.

From UI it was taking 20-30seconds. 

Debugging Steps:

1)  Checked ncs --status out and searched for "waiting". Usually, if we see this "waiting" string for a long time against running datastore then there is an offending commit or subscription lock taken on NCS which is not released. That was not the case


2)  Checked that GET API's which are faster than usual. Top output also was normal.

curl -k -v -u Administrator:Versa@123  https://10.192.220.193:9182/api/config/system -H "Content-Type: application/json" -H "Accept: application/json"

It means it is not a general slowness-ness problem in NCS but only commits are slower.


3)  Stopped slave suspecting to see if synchronous writes taking time. In fact, there was a message like "pending replies from a slave" in ncs --status output. Removed un-reachable syslog server configured in UI. Both of these didn’t help.


4)  Checked stack dump file ( jstack <pid> ). This reported that the respective http-thread ( figured out from springboot/access.log with PUT request ) is either getting blocked on audit code(not NCS code) or finishTrans of NCS.

springboot/access.log printed that it is taking 30 sec approx for every PUT call ( Tried device interface description change from UI). From CLI, consistently commit was taking 10 secs.


5)  At same-time, we saw continuous prints with the below statements in ncs/devel.log 

"ncs the datastore running is locked in cdb"

"ncs the datastore running is locked in cdb"

"ncs the datastore running is locked in cdb"

"ncs the datastore running is locked in cdb"

This is an indication that locks are being taken on CDB continuously so commit from CLI/UI is getting delayed. There is a contention for write lock in NCS.


6) Executed below command to see what writes are happening to DB to understand who is taking lock.

ncs_cmd -c  'subwait_iter / 10 9999'

Unfortunately, this was not printing anything. So it means write lock is taken not because of CDB Writes to running data-store.


7) Once again checked ncs --staus output. Searched for this message in "has the transaction lock running.." and it pointed to a different 

tid value each time like below. 

transaction tid=1812197 has the trans lock on running


8) Now enabled NCS Java logging in file changing "info" to "debug". This needs vsh restart.

/opt/versa/vnms/etc/conf/ncslog4j/log4j.xml


9) Again noted down tid value for which "has the trans lock on running" was printed.

Searched for that in vnms-spring-boot-ncs.log.

cat vnms-spring-boot-ncs.log | grep <tid>


It pointed to APPL-MON-THD which was doing device connect for tid - 128922

APPL-MON-THD-391-NV-VNF-DR-GW01-LON][PROTO_LOGGER] SEND op=241,th=128922 {{hxml,[]},[[ncs|devices],device,{#Bin<VNF-DR-GW01>},connect]


And this was clue as we know device connect will take lock with in NCS.

Since there were 25 Threads in Colt Setup with 2300 devices always there is thread in Appliance Monitoring attempting to perform device/connect creating lock contention delayed a commit/write to CDB.


Please note that this issue is applicable only for 21.2.2 release, not 20.2.x and 21.1.x.