There is a service which executes a simple query on the database. The difference between the start time to run the query and the end time when the query completes, is the latency used in the rule. The simple query is very fast to be executed and always returns the result 1. The query is:
select count(*) from database_instance_id;
The measured time difference would be submitted to the metric "databaseLatency" of the "CatalystDatabaseStatus" object. The rule compares the "databaseLatency" metric with the given thresholds (10, 20 or 50 ms) and triggers corresponding alarms.
The latency includes several parts: the time sending a request from the Foglight server to the database, the time executing the query on the database and the time sending the result back to the Foglight server. The simple query has little impact on the database and is normally executed very fast, so it is necessary to focus on network issues and the performance of the database when the rule triggers an alarm.
Please note that the rule is not specifically mentioned that it is either a network issue or IO issue. The high latency may be caused by several factors like bad network condition, slow IO, bad performance on the database or bad performance on the Foglight server. The rule can only give a prompt to check but it cannot identify what is the cause exactly.
Currently the rule creates a warning alarm when DB performance is over 10ms.
© 2024 Quest Software Inc. ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center