What steps are needed to resolve the "disk queue length" alarm?
Disk Queue Length alarm
The Disk Queue Length alarm becomes active when the maximum disk queue length of any disk exceeds a threshold.
Sustained high disk queue length is a good indicator that you have a disk subsystem bottleneck and usually means that I/O times will be degraded.
Note that since this is a Windows-based metric, it can be affected by I/O that was done by non-SQL Server processes.
The occurrence of this alarm does not necessarily indicate a problem with SQL Server; however SQL Server (and anything else on the machine that does disk I/O) will feel the impact of slower disk throughput.
When this alarm is current, you should look at:
The SQL I/O Activity page on the SQL Activity drilldown to see if SQL Server is generating high amounts of disk activity. The Physical I/O chart on this page shows the rate (I/O per second) for each type of I/O that SQL Server is doing. If SQL Server is not generating a lot of I/O activity, then it is most likely being caused by some other Windows process, or by Windows itself.
In Spotlight on Windows Enterprise, the Logical Disk Activity tab on the Disks drilldown to see which disk(s) are being hit the hardest. This and the list of disks that contain SQL Server databases (from the Databases grid on the Databases drilldown) may help to determine if SQL Server is generating the I/O.
The Sessions page on the SQL Activity drilldown to see which SQL Server processes are executing at the time, and the SQL currently being executed.
Consider moving database files to faster disks. If you are not using hardware RAID, consider purchasing a RAID subsystem. If you are using RAID-5 for write-intensive files (such as Database Logs or heavily updated database files), consider moving to a faster RAID implementation (RAID-0 or RAID-10).
In some cases, you can speed up all disk I/O by reviewing the RAID options on your RAID controllers. One example is to enable disk write caching, as long as your disk subsystem is protected by battery backups or UPS.
If the Checkpoint process is generating lots of I/O (see the Checkpoint statistic on the SQL Physical I/O chart of the SQL I/O Activity page), review your Recovery Interval setting in the Configuration drilldown.