An environment has many SQL Server availability groups where the secondary replica is not available for monitoring.
These availability groups can failover at any time between 2 servers. In some cases, there is a two node cluster where the availability groups are split between the two nodes with some availability groups active on each node. Since the databases are not available on the secondary, many Days Since last backup are fired.
Foglight’s current logic depends on the configured backup preference rather than checking the actual backup history (from msdb) across all AG replicas. This results in false backup alarms during valid failover scenarios.
If the failover remains in effect for an extended period and backup jobs shift accordingly (i.e., full backups now run on the new Primary), Foglight may still ignore the new Primary based on the backup preference setting. This behavior can result in gaps in monitoring and missed alarms for actual backup failures, or conversely, false alarms despite successful backups.
RESOLUTION 1
In this environment the secondary node was always in READ_ONLY by default, therefore backups wouldn't be done on the secondary replica and backups were performed on the primary replica based on the selected backup preference option of "Prefer Secondary" in the AG.
Select "Primary" as the option for backup preference in the AG then the alarm will filter out the secondary by "preferred_replica_for_backup=0"
RESOLUTION 2
Enable the COPY_ONLY option and remove the "updateability check" step in the backup job. Once the COPY_ONLY is checked, the full backup can be executed. So both primary and secondary can back-up then the alarm will not trigger.
RESOLUTION 3
Workaround
None
Status
Enhancement ID FOG-239 has been logged to enhance the backup alarms for availability groups. This is planned for an upcoming release of the SQL Server cartridge. Included with the logic change will be the following background note:
"For Availability Group, Foglight checks all availability replicas in an availability group. The alarm is not raised if at least one replica backs up the database.
Requirements: For this functionality to work, ensure the Foglight connection name matches the name in column replica_server_name in sys.availability_replicas."