This article outlines the process for investigating and troubleshooting various Foglight dashboard, metric, rule, and alarm issues such as:
Alarms not firing as expected
Incorrect metric values
Missing or delayed alarm data
Duplicate alarms
Understanding how Foglight collects and processes monitoring data
These issues can occur due to a variety of reasons, including configuration settings, disabled rules, data collection failures, or temporary conditions in the monitored environment.
Common causes include:
Cause ID | Description |
---|---|
1 | Threshold is too high for the rule to fire |
2 | Alarms email notifications are disabled in the database agent administration settings |
3 | Disabled rules in the Rules Management dashboard (Not applicable to Foglight Cloud) |
4 | Alarms Service is stopped (Not applicable to Foglight Cloud) |
5 | FMS lacks sufficient resources to process data (Not applicable to Foglight Cloud) |
6 | No matching data collected to trigger alarm |
7 | Blackout is configured for the agent |
8 | High volume of alarms in database (Not applicable to Foglight Cloud) |
9 | Email notifications sent by another FMS instance (Not applicable to Foglight Cloud) |
10 | Custom or duplicate rules exist (Not applicable to Foglight Cloud) |
11 | Unsuccessful connection to host or database |
12 | Collection frequency is too low, or data event occurs between collections |
13 | Email server settings are not configured |
14 | Alert/Log filtering is set to only fire Fatal or OFF messages |
15 | Misconfigured cloned rule (Not applicable to Foglight Cloud) |
16 | Data issue in the collection query |
17 | Temporary system issue, e.g., long query, autoextending tablespace |
18 | Alarm is disabled in the assigned alarm template |
Review the dashboard or UI for missing data or timeouts per specific screen or collection.
Validate metric data in the Foglight UI or topology browser.
Check agent logs for collection failures or query timeouts.
Review agent configuration settings (frequency, batch size, credentials).
Check alarm history for patterns or recent changes.
Inspect the rule logic and thresholds used to generate the alarm.
Metrics are objects in the FMS that store collected data in segments:
History: Full collected history
Latest: Most recent value collected
Period: Aggregated (average) value over selected time range
Current: Latest value in selected time range
The Current and Latest values may match depending on the selected time period (e.g., “Last Hour”).
Investigation Tip:
Use Configuration > Data or Administration > Tooling > Script Console to access raw metric data.
Most collections are SQL queries or OS commands executed by the agent via the FglAM.
Collection issues may show up in logs if:
There is a timeout or permission error.
The data collected is invalid or malformed.
Investigation Tip:
Check the agent log file for errors tied to specific collections. Re-run the query directly on the monitored host for validation.
Agent behavior is defined by:
Hostname, credentials, connection parameters
Collection frequencies and batch sizes
Timeout values
Many agents log their configuration at startup.
Investigation Tip:
Verify and adjust relevant settings as needed. Use the “Validate connection” feature to test credentials.
Alarms are triggered by rules based on threshold or Boolean logic.
Alarms have multiple severity levels:
Color | Severity |
---|---|
Red | Fatal |
Orange | Critical |
Yellow | Warning |
Some rules use Baseline thresholds that automatically learn over time.
Investigation Tip:
Examine historical alarm patterns and alarm messages for inconsistencies or gaps.
Each rule evaluates one or more metrics.
Rule types include:
Fixed threshold
Baseline threshold
Boolean condition
Investigation Tip:
Compare rule behavior to an out-of-the-box (OOTB) rule from the same cartridge version. Roll back custom rules to confirm if the issue is related to customization.
Resolution ID | Action |
---|---|
1 | Temporarily lower rule threshold to verify alarm triggers |
2 | Enable Alarms Email Notifications (KB 4310657) |
3 | Enable rules via Rules Management (Administration > Rules & Notifications > Rules) |
4 | Restart the FMS process or the Alarm Service via JMX console |
5 | Review and adjust FMS/FglAM memory (heap) settings; consider reboot/failover |
6 | Verify that the system generates qualifying data and agent is running |
7 | Remove active blackouts from Global View > Settings > Manage Alarm Blackouts |
8 | See KB 4295903 for purging old alarms |
9 | Confirm if email notifications are coming from another FMS |
10 | Review rule for customization (e.g., custom name, email format); compare to default |
11 | Validate connectivity (see Oracle [KB 4235896], SQL Server [KB 4229902], DB2 [KB 4295717]) |
12 | Increase agent collection frequency (KB 4308784) |
13 | Verify SMTP/email configuration (KB 4352966) |
14 | Adjust SQL Server Error Log or Oracle Alert Log filtering settings |
15 | Use original OOTB rule configuration for best support compatibility |
16 | Re-run the collection SQL and compare result with alarm context |
17 | Review raw metrics for anomalies in Configuration > Data |
18 | Ensure alarm is enabled in the assigned alarm template (KB 4226517) |
Use debug mode in the FglAM to capture additional log detail.
Use Script Console for advanced queries against topology objects.
Review agent and rule documentation per cartridge for rule-specific details.
© 2025 Quest Software Inc. ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center