In the "Alarms" dashboard the graph "All System Alarms & Changes" displays much more alarms than the alarms table below the graph. When clearing the alarm the graph is going down but not to 0.
Two reasons for this:
These kind of alarms are not displayed in the alarms table the but the graph above the table counts the SLA alarms. The table heading is: "xxx Current Alarm(s) [...] (Not including SLA Alarms)". SLA alarms are service alarms, which show that something is going wrong with a service component. Each SLA alarm will be escalated to the next higher component until the top level component is reached and on each level a new alarm is created.
For example:
A service has 3 levels: Top level: database; second level: database version (Oracle or SQL Server); third level: database instances (ORCL_1, ORCL_2, Named_Instance_1, Named_Instance_2, ...). Now one database is having a problem and raised an alarm (e.g. data files are full and cannot be extended). Additional to the specific alarm an SLA alarm will be created for the specific database on the third level. Another SLA alarm will be created on the second level for the component (e.g. Oracle) and finally a third SLA alarm will be created for the top level databases. That means: for one specific alarm there will be 3 additional SLA alarms created.
The Service Level Alarm is fired once per hour, but displayed only once. You will not see multiple Service Level Alarms on the dashboard because it just displays that an un-cleared Service Level Alarm exists and has not been cleared yet.
It can happen that an alarm loses the relation to the topology object the alarm is assigned to. This happens very rarely but it can happen, for example if data is deleted via the Data Management dashboard. Specially when doing a lot of testing, install and uninstall agents, delete data via Data Management and so on it can happen by accident.
Attached is a groovy script which deletes orphaned alarms.
© ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center