After migrating and upgrading Spotlight reporting a Performance Health Rating has changed. How this is scored and why the Performance Health Rating is now reported as a much lower value than before from unhealthy to healthy and how the ratings are assigned.
Here's what is involved in the scoring:
Get the results of the Wait Statistics Details collection. This collection has two columns relating to the wait time; the total overall wait and the signal wait, which represents waiting for CPU. The total wait less the signal wait is the IO wait.
In the following calculations any wait types listed in the file Agent\conf\PACKAGE\sqlserver_spotlight\IdleWaits.txt are ignored.
1. Sum the IO wait for wait types whose name starts with PAGEIOLATCH to get a Page IO Latch Wait total.
2. Sum the IO wait for wait types in the IO category to get an IO Wait total.
3. Sum the signal wait for all wait types to get a CPU Wait total.
4. Sum the wait for all wait types to get a Total Wait total.
5. Sum the tasks waiting for wait types whose name starts with PAGEIOLATCH to get a Tasks waiting on Page IO Latch total.
6. Calculate IO latency to be Page IO Latch Wait total divided by Tasks waiting on Page IO Latch.
7. Add IO Wait total and CPU Wait total to get the CPU and IO total.
8. Calculate the CPU and IO percent to be 100 times the CPU and IO total divided by the Total Wait total.
9. Calculate the DB time, which is a measure of how busy the instance is, to be the CPU and IO total divided by 60.
10. Calculate the IO latency score
If IO Latency is less than 5
Set the IO latency score to 100
Set the IO latency score to 100 - (10 * ((IO Latency - 5) / 5))
11. Calculate the CPU and IO score
If the CPU and IO percent is greater than 95
Set the CPU and IO score to 100
Set the CPU and IO score to 100 - (10 * ((95 - CPU and IO Percent) / 5))
12. Calculate the Performance Health score
If DB time is less than 25
Set the Performance Health score to 100
Set the Performance Health score to the IO latency score plus the CPU and IO score divided by 2
Step 8 seems to cause problems because it is a ratio of the good (CPU and IO) waits to the bad (the rest) waits which doesn't factor in the magnitude of the waits. Step 12 is supposed to set the score to healthy (100) when there is little workload but it doesn't seem to be right.This is going to be improved in version 12.0