How to generate alert only if a specific percentage of servers of a group are down?
Foglight has a way of grouping objects into services and can use the ServiceLevelEvaluation - FMSServiceSLP rule to alert when the service availability drops below a certain percentage over a period of 1 hour. By default, the service is considered unavailable when any of the members in the service has a Fatal alarm.
This default can be changed to meet the requirement above.
A full explanation of this mechanism can be found here.
In summary, this can be implemented by creating a Service in Service Builder and add the Hosts or objects that will make up the required Service.
The percentage of objects within that service that will be tolerated before determining that the Service Availability is 0 is defined by the FSMServiceSLP_PercentageAvailableThreshold registry variable. This can be customized for specific services by scoping a custom value to the specific FSMService object that represents the required Service.
Once this has been completed, the new tolerance will be applied to the specific service and will trigger the ServiceLevelEvaluation - FMSServiceSLP rule accordingly.