FxM reports showed delayed data by 15-20 minutes but recovered after half an hour. Why did it happen?
This was only spontaneously happened ( 2 incidents reported in the past few days ) so general check on those metrics like db load error, db load time, excessive data volume, memory in system health did not seem to reveal many clues there. And their uploaded number of aggregated metrics from probes are quite consistent. This is an environment that nearly 200 FxM GUI users who might use FxM to check metrics or generate various reports.
From /var/log/mysql_query the following log entry could be seen at the time frame when the problem happened.
# Time: 160401 9:37:09
# User@Host: root[root] @ localhost 
# Query_time: 2203.496265 Lock_time: 0.000184 Rows_sent: 393 Rows_examined: 5331668
SELECT HitCount,Hit.ResourceID,HitResource.ResourceID,GroupID,TimeStamp FROM Page.Hit,Page.HitResource WHERE TimeType = "3" AND BucketID >= "300" AND BucketID <= "309" AND ( TimeStamp >= "20160201050000" AND TimeStamp <= "20160331040000" ) AND GroupID = "2" AND HitResource.ResourceID = Hit.ResourceID AND REGEX_MATCH(Name, ".*scotiaonline.*rusteer.*bns.*") = "1" ORDER BY HitCount DESC;
So it does look like some users are running reports that are delaying their metric loading. Since this is a root MySQL user running on localhost, it’s probably a report in the UI. Using the customized REGEX_MATCH function to examine +5M records could be very expensive.