Alarms constitute one of the biggest factors that impact the performance of Foglight. It is therefore very important to make sure that the Foglight environment does not generate or contain an excessive number of alarms. It is under the impression that only currently uncleared alarms affect performance. This is not true. Even the old alarms can adversely affect the performance of a Foglight Management Server.
Periodically the FMS queries the alarms table to determine the health of all managed topology objects. If this table is especially large these queries become very expensive and cause the server to become unresponsive. Sometimes these alarm queries become so expensive that they do not complete before their next scheduled execution, you can see this in the FMS logs by searching for messages like the one below:
"2010-03-02 15:17:26.654 WARN [QuartzScheduler.Monitor_Worker-2]com.quest.nitro.service.monitor.AbstractMonitor - Alarm Counts Topology Monitor is still active. Scheduled execution will be skipped".
"2010-03-02 15:16:26.622 WARN [QuartzScheduler.Monitor_Worker-4] com.quest.nitro.service.monitor.AbstractMonitor - Aggregate Alarms Topology Monitor is still active. Scheduled execution will be skipped".
These messages tell you that the queries that process alarms to determine the health of topology objects are taking so long that their next scheduled executions will be skipped.
To fix this issue, first determine the size of the alarms table by running the following SQL against the repository database: SELECT COUNT(*) FROM Alarm_Alarm
This will return the number of records in the Foglight Alarms table. Generally having >500K records in this table is not a good idea. Unless you have a requirement to Keep alarms indefinitely, it is good practice to purge old alarms from this table when they are no longer required. A simple script like the one below can be used to perform such a task:
long now = System.currentTimeMillis()
Calendar threshold = Calendar.getInstance();
threshold .add(Calendar.DATE, -11)
server.get("AlarmService").purgeAlarms(new Date(0), threshold.getTime());
"done in "+(System.currentTimeMillis() -now)/1000+"s"