This article describes, How to fix the HA failing issue on SLAEV Director if UI is throwing the "Postgres operation failed " error.

Director task Error-


Postgres service also will go down on the Director-


Log file:

 

/var/log/postgresql/postgresql-11-main.log

/var/log/vnms/ha/postgre-ha.log

Error Pattern:

/var/log/postgresql/postgresql-11-main.log



2022-07-29 07:43:57.217 PDT [17101] vnms@vnms ERROR:  current transaction is aborted, commands ignored until end of transaction block

2022-07-29 07:43:57.217 PDT [17101] vnms@vnms STATEMENT:  select 1

2022-07-29 07:44:05.257 PDT [17133] repmgr@repmgr ERROR:  relation "repmgr.nodes" does not exist at character 214

2022-07-29 07:44:05.257 PDT [17133] repmgr@repmgr STATEMENT:      SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, un.node_name AS upstream_node_name, NULL AS attached       FROM repmgr.nodes n  LEFT JOIN repmgr.nodes un         ON un.node_id = n.upstream_node_id WHERE n.node_id = 1

2022-07-29 07:44:05.291 PDT [17138] repmgr@repmgr ERROR:  relation "repmgr.nodes" does not exist at character 214

2022-07-29 07:44:05.291 PDT [17138] repmgr@repmgr STATEMENT:      SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, un.node_name AS upstream_node_name, NULL AS attached       FROM repmgr.nodes n  LEFT JOIN repmgr.nodes un         ON un.node_id = n.upstream_node_id WHERE n.node_id = 1

2022-07-29 07:44:05.325 PDT [17142] postgres@postgres ERROR:  role "repmgr" already exists

2022-07-29 07:44:05.325 PDT [17142] postgres@postgres STATEMENT:  CREATE ROLE repmgr SUPERUSER CREATEDB CREATEROLE INHERIT LOGIN;

2022-07-29 07:44:05.358 PDT [17145] postgres@postgres ERROR:  database "repmgr" already exists

2022-07-29 07:44:05.358 PDT [17145] postgres@postgres STATEMENT:  CREATE DATABASE repmgr OWNER repmgr;

2022-07-29 07:44:05.686 PDT [15853] LOG:  received fast shutdown request

2022-07-29 07:44:05.689 PDT [15853] LOG:  aborting any active transactions

2022-07-29 07:44:05.689 PDT [16113] vnms@vnms FATAL:  terminating connection due to administrator command

2022-07-29 07:44:05.689 PDT [16061] vnms@vnms FATAL:  terminating connection due to administrator command

2022-07-29 07:44:05.691 PDT [15853] LOG:  background worker "logical replication launcher" (PID 15860) exited with exit code 1

2022-07-29 07:44:05.691 PDT [15855] LOG:  shutting down

2022-07-29 07:44:05.704 PDT [15853] LOG:  database system is shut down

2022-07-29 07:45:06.562 PDT [17570] LOG:  listening on IPv4 address "0.0.0.0", port 5432

2022-07-29 07:45:06.562 PDT [17570] LOG:  listening on IPv6 address "::", port 5432

2022-07-29 07:45:06.563 PDT [17570] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"

2022-07-29 07:45:06.580 PDT [17571] LOG:  database system was interrupted; last known up at 2022-07-29 07:44:57 PDT

2022-07-29 07:45:06.614 PDT [17571] FATAL:  syntax error in file "recovery.conf" line 4, near token "repmgr_slot_2"

2022-07-29 07:45:06.616 PDT [17570] LOG:  startup process (PID 17571) exited with exit code 1

2022-07-29 07:45:06.616 PDT [17570] LOG:  aborting startup due to startup process failure

2022-07-29 07:45:06.617 PDT [17570] LOG:  database system is shut down

pg_ctl: could not start server



/var/log/vnms/ha/postgre-ha.log



Issue due to Bug-82751


Workaround to be followed: 


The below workaround needs to be performed within few mins ~5 mins and only after once we see Step-1 PostgreSQL service start log

1. In /var/log/vnms/ha/postgre-ha.log once you see the PostgreSQL service is getting started

[Sat Jul 30 05:10:41 UTC 2022] Drop and recreate repmgr database

NOTICE:  database "repmgr" does not exist, skipping

[Sat Jul 30 05:10:42 UTC 2022] Stopping PostgreSQL service..

[Sat Jul 30 05:10:44 UTC 2022] [Stopped PostgreSQL]

[Sat Jul 30 05:10:54 UTC 2022] Modify repmgr configuration..

[Sat Jul 30 05:10:54 UTC 2022] Starting PostgreSQL service..

2. go to the /var/lib/postgresql/11/main/recovery.conf file and edit below changes-

[Administrator@director-2: ~] $ sudo su

root@director-2:/home/Administrator# vi /var/lib/postgresql/11/main/recovery.conf



3. restart the postgresql service using below command-

"sudo systemctl restart postgresql"


4. Check the HA status in /var/log/vnms/ha/postgre-ha.log


UI Status-