Investigating and Troubleshooting Dashboard Metric, Rule, and Alarm Issues (4308244)

Return

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Title

Investigating and Troubleshooting Dashboard Metric, Rule, and Alarm Issues
Description
This article outlines the process for investigating and troubleshooting various Foglight dashboard, metric, rule, and alarm issues such as:
- Alarms not firing as expected
- Incorrect metric values
- Missing or delayed alarm data
- Duplicate alarms
- Understanding how Foglight collects and processes monitoring data

Cause

These issues can occur due to a variety of reasons, including configuration settings, disabled rules, data collection failures, or temporary conditions in the monitored environment.

Common causes include:

Cause ID	Description
1	Threshold is too high for the rule to fire
2	Alarms email notifications are disabled in the database agent administration settings
3	Disabled rules in the Rules Management dashboard (Not applicable to Foglight Cloud)
4	Alarms Service is stopped (Not applicable to Foglight Cloud)
5	FMS lacks sufficient resources to process data (Not applicable to Foglight Cloud)
6	No matching data collected to trigger alarm
7	Blackout is configured for the agent
8	High volume of alarms in database (Not applicable to Foglight Cloud)
9	Email notifications sent by another FMS instance (Not applicable to Foglight Cloud)
10	Custom or duplicate rules exist (Not applicable to Foglight Cloud)
11	Unsuccessful connection to host or database
12	Collection frequency is too low, or data event occurs between collections
13	Email server settings are not configured
14	Alert/Log filtering is set to only fire Fatal or OFF messages
15	Misconfigured cloned rule (Not applicable to Foglight Cloud)
16	Data issue in the collection query
17	Temporary system issue, e.g., long query, autoextending tablespace
18	Alarm is disabled in the assigned alarm template
19	Minimal workload isn't met for an Oracle agent

Resolution

General Troubleshooting Steps

Review the dashboard or UI for missing data or timeouts per specific screen or collection.
Validate metric data in the Foglight UI or topology browser.
Check agent logs for collection failures or query timeouts.
Review agent configuration settings (frequency, batch size, credentials).
Check alarm history for patterns or recent changes.
Inspect the rule logic and thresholds used to generate the alarm.

Component-Level Overview

Metrics

Metrics are objects in the FMS that store collected data in segments:
- History: Full collected history
- Latest: Most recent value collected
- Period: Aggregated (average) value over selected time range
- Current: Latest value in selected time range
The Current and Latest values may match depending on the selected time period (e.g., “Last Hour”).

Investigation Tip:
Use Configuration > Data or Administration > Tooling > Script Console to access raw metric data.

Collections

Most collections are SQL queries or OS commands executed by the agent via the FglAM.
Collection issues may show up in logs if:
- There is a timeout or permission error.
- The data collected is invalid or malformed.

Investigation Tip:
Check the agent log file for errors tied to specific collections. Re-run the query directly on the monitored host for validation.

Agent Configuration

Agent behavior is defined by:

Hostname, credentials, connection parameters
Collection frequencies and batch sizes
Timeout values

Many agents log their configuration at startup.

Investigation Tip:
Verify and adjust relevant settings as needed. Use the “Validate connection” feature to test credentials.

Alarms

Alarms are triggered by rules based on threshold or Boolean logic.
Alarms have multiple severity levels:

Color	Severity
Red	Fatal
Orange	Critical
Yellow	Warning

Some rules use Baseline thresholds that automatically learn over time.

Investigation Tip:
Examine historical alarm patterns and alarm messages for inconsistencies or gaps.

Rules

Each rule evaluates one or more metrics.
Rule types include:
- Fixed threshold
- Baseline threshold
- Boolean condition

Investigation Tip:
Compare rule behavior to an out-of-the-box (OOTB) rule from the same cartridge version. Roll back custom rules to confirm if the issue is related to customization.

Related Issue Resolutions

Resolution ID	Action
1	Temporarily lower rule threshold to verify alarm triggers
2	Enable Alarms Email Notifications (KB 4310657)
3	Enable rules via Rules Management (Administration > Rules & Notifications > Rules)
4	Restart the FMS process or the Alarm Service via JMX console
5	Review and adjust FMS/FglAM memory (heap) settings; consider reboot/failover
6	Verify that the system generates qualifying data and agent is running
7	Remove active blackouts from Global View > Settings > Manage Alarm Blackouts
8	See KB 4295903 for purging old alarms
9	Confirm if email notifications are coming from another FMS
10	Review rule for customization (e.g., custom name, email format); compare to default
11	Validate connectivity (see Oracle [KB 4235896], SQL Server [KB 4229902], DB2 [KB 4295717])
12	Increase agent collection frequency (KB 4308784)
13	Verify SMTP/email configuration (KB 4352966)
14	Adjust SQL Server Error Log or Oracle Alert Log filtering settings
15	Use original OOTB rule configuration for best support compatibility
16	Re-run the collection SQL and compare result with alarm context
17	Review raw metrics for anomalies in Configuration > Data
18	Ensure alarm is enabled in the assigned alarm template (KB 4226517)
19	Review the FMS log to confirm if the minimal workload has been met (KB 4380206)

Additional Information

Use debug mode in the FglAM to capture additional log detail.
Use Script Console for advanced queries against topology objects.
Review agent and rule documentation per cartridge for rule-specific details.

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Request a KB Article

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem