The FMS in HA mode keeps restarting (4261697)

Return

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Title

The FMS in HA mode keeps restarting
Description

The FMS (Foglight Management Server) in HA (High Availability) mode keeps restarting.
In the FMS logs a shutdown request can be seen before the event; for example:
YYYY-MM-DD hh:mm:ss.SSS VERBOSE [Thread-168] main\src\native\Launcher\src\windows\control - Shutdown request from: SYSTEM. Process ID 6556

Messages of the following type may be found in the server_restarter log:

YYYY-MM-DD hh:mm:ss.SSS WARN [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Forge Server has not responded to health checking 761 seconds after startup. Next retry in 60 seconds.
YYYY-MM-DD hh:mm:ss.SSS INFO [Thread-2] QcnUtil\src\windows\qcn_service - Received stop request. the FoglightHA service will now be shut down.
YYYY-MM-DD hh:mm:ss.SSS INFO [Thread-4] QcnUtil\src\windows\qcn_process_windows - Shutdown request transmitted.

YYYY-MM-DD hh:mm:ss.SSS ERROR [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Forge Server is not yet available 5,401 seconds after startup, grace time is up and the server will be restarted.
YYYY-MM-DD hh:mm:ss.SSS INFO [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Stopping Forge Server.
YYYY-MM-DD hh:mm:ss.SSS INFO [server-heartbeat-listener] QcnUtil\src\windows\qcn_process_windows - Shutdown request transmitted.
YYYY-MM-DD hh:mm:ss.SSS INFO [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Forge Server stopped normally.
YYYY-MM-DD hh:mm:ss.SSS INFO [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Attempting (1/1) to restarting Forge Server with the command bin\fms -Dfoglight.cluster.mode=true -Dquest.common.process-runner=false -Dquest.native.launcher.io=true ...
YYYY-MM-DD hh:mm:ss.SSS INFO [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Forge Server reports normal health state.

YYYY-MM-DD hh:mm:ss.SSS VERBOSE [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Fail to check Forge Server health state.
org.springframework.remoting.RemoteConnectFailureException: Could not connect to HTTP invoker remote service at [http://localhost:8080/foglight-sl/HealthCheck]; nested exception is java.net.ConnectException: Connection refused: connect
...
Caused by: java.net.ConnectException: Connection refused: connect

YYYY-MM-DD hh:mm:ss.SSS VERBOSE [server-heartbeat-listener] com.quest.nitro.ha.monitor.local.ServerHeartBeatListener - Fail to check Forge Server health state.
org.springframework.remoting.RemoteAccessException: Could not access HTTP invoker remote service at [https://localhost:8443/foglight-sl/HealthCheck]; nested exception is javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Cause

CAUSE 1
The health check URL is configured to use HTTPS and the certification path cannot be validated.

CAUSE 2
The FMS isn't listening on the localhost address. The FMS host has multiple IP addresses and bound to one of them using the steps from KB article 57902.

CAUSE 3
The default HTTP port was modified.

CAUSE 4
The restart monitor gives the server a grace period of 60 minutes by default; if the FMS takes longer to start up the health check fails.

CAUSE 5
The upgrade process was taking a very long time enabling the cartridges and during this time, the HA server restarter has already exceeded the grace period to which it has to wait for the FMS startup.
Resolution

RESOLUTION 1

Change the restart monitoring configuration to use HTTP:

Check the %FMS_HOME%\config\restart_monitor.config file to see if the health check URL is set as follows:

health.check.url = "https://localhost:8443/foglight-sl/HealthCheck";

If the health check URL is using https modify the value of the health.check.url parameter and set it as follows:

health.check.url = "http://localhost:8080/foglight-sl/HealthCheck";

After modifying the health.check.url parameter the change should be reflected in the server restarted logs; if not, stop the FMS, verify that the FMS is stopped, and start the FMS in HA mode.

Note: Another option is to import the certificates to Foglight's TrustStore and update the health check URL on both nodes with their respective node/host name.

RESOLUTION 2

Modify the health.check.url in the restart_monitor.config file (%FMS_HOME%config\restart_monitor.config) to use the bound IP address as follows:

health.check.url = "http://boundip:8080/foglight-sl/HealthCheck";

RESOLUTION 3

Within the restart_monitor.config file change the health.check.url value to match the non-default HTTP port specified in the server.config.

Example:

In server.config:

server.http.port = "8181";

In restart_monitor.config:

health.check.url = "http://hostname:8181/foglight-sl/HealthCheck";

RESOLUTION 4

If the FMS takes longer than the default grace period, consider increasing the start.grace parameter in the restart_monitor.config file (%FMS_HOME%\config\restart_monitor.config) to a value greater than the amount of time it takes the FMS.

Default value is:

startup.grace = 3600;

RESOLUTION 5

1. Increase value for "startup.grace" from 3600 to 7200 for the file %FMS_HOME%\config\restart_monitor.config:

startup.grace = 7200;

2. Restart the FMS and perform the upgrade again

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Request a KB Article

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

The FMS in HA mode keeps restarting (4261697)

Title

Description

Cause

Resolution

Leave a Comment