A SQL Server Agent job fails during a step at 5:36 AM. Despite the failure, Foglight did not trigger a job failure alarm. However, a long-running job alarm was raised for the same job at 5:22 AM, which is not typical behavior.
The job was scheduled to run every 15 minutes. On this day:
Additionally, Foglight’s rule logic includes a check on last_run_finish_time
. If this timestamp matches a previous value, the rule may suppress the alarm to avoid duplicates, even if the job failed.
Summary
Since Foglight uses a query that retrieves only the most recent job-level status. If a subquent job runs after the previous job and completes before the earlier job, the successful job will overwrite the failed status of the earlier job.
Workaround
None
Status
Enhancement Request ISMTS-393 has been logged to retrieve any job-level status, not just the most recent. This will be considered by Product Management for inclusion in a future release of the SQL Server cartridge