Several alarms can be investigated using the Database Details panel of the Databases drilldown, as follows:
Recent Backups Alarm
The Recent Backups alarm becomes active when Foglight for SQL Server detects that any SQL Server database has not been backed up in the last few days.
To handle the recent backups alarm:
Database Unavailable Alarm
The Database Unavailable alarm becomes active when Foglight for SQL Server detects that a SQL Server database is not available for reading. Users attempting to access an unavailable database receive an error message.
This alarm detects unusual database statuses, including Suspect, Offline, Recovering, Loading, Restoring, Emergency Mode, and others.
When this alarm occurs, you should:
Some of the more common unavailable statuses are detailed in the following sections:
Offline
Setting databases offline can only be carried out manually, using the sp_dboption procedure. If any databases are Offline, consider using sp_dboption or ALTER DATABASE to bring the database online again.
Loading or restoring
Databases marked as Loading or Restoring are currently being restored by a RESTORE DATABASE or RESTORE LOG command. The database cannot be accessed by anyone while these commands are executed.
This status is also assigned to databases that have been restored using the NORECOVERY option. Specifying this parameter on a RESTORE statement notifies SQL Server that additional transaction logs need to be restored, and that no access to the database is permitted until these transactions are executed.
Check the Sessions panel on the SQL Activity drilldown for active sessions that are processing a RESTORE command (where the Last Command column contains Restore). If no sessions are processing a RESTORE command, the most likely reason for the database’s unavailability is that the last restore was carried out using the NORECOVERY keyword.
Removing the Loading/Restoring status requires completing the RESTORE process. This can involve either waiting for the active RESTORE command to complete, or restoring the remaining transaction logs. The last transaction log should be restored without the NORECOVERY keyword. If the database is mirrored, a Restoring status is shown on the mirror.
Recovering
Databases are Recovering (or InRecovery) for a while when SQL Server is restarted, or the database is first set online. This is the status SQL Server uses for indicating that it is re-applying committed transactions, or removing uncommitted transactions after a SQL Server failure.
Normally, re-applying these transactions should take only a short time; however, if any long-running transactions were open when SQL Server ended abnormally, this procedure can take an extended period.
In some cases, it is advisable to bypass the SQL Server recovery process. For example, it would make much more sense to skip a lengthy recovery process when planning to drop the database as soon as the recovery process completes. For details on skipping the recovery process, see Bypassing SQL Server recovery in the Foglight for SQL Server User's Guide.
CAUTION: Bypassing the recovery process can corrupt the database
Suspect
Databases can be Suspect if they fail SQL Server's automatic recovery. This status most commonly appears after a SQL Server restart, when the automatic recovery process carried out during restart has failed. Databases can also be marked as Suspect when serious database corruption is detected.
The first measure that should be taken when a Suspect database is detected is to check the SQL Server error log, and look for error messages indicating recovery failure or database corruption. These messages should indicate the problem’s cause.
To correct a suspect database, consider taking the following measures:
In most cases, a suspect database is best handled by restoring the database from the last good full database backup and transaction logs.
Using emergency mode
Emergency mode is a special status, which can be set on an individual database, thereby causing SQL Server to skip recovery for this specific database. In some cases, taking this measure can make the corrupt database available in order to extract data that cannot be retrieved in any other way.
Activating emergency mode causes SQL Server to skip the recovery of this database, thereby preventing the database being made suspect. However, the database may contain partially-complete transactions, and there may be inconsistencies between data and indexes (logical and physical corruptions). Do not carry out any database changes or updates when SQL Server is started in this way. Emergency Mode is documented at: http://support.microsoft.com/support/kb/articles/Q165/9/18.ASP.
Bypassing SQL Server recovery
Another high risk option to access a suspect database is to start SQL Server with Trace Flag 3608. This trace flag causes SQL Server to skip its automatic recovery process on ALL DATABASES when it starts. Again, this procedure may be sufficient for extracting data that cannot be retrieved in any other way.
With both Emergency Mode and Bypassing SQL Server Recovery, you may then be able to extract your data using BCP.EXE and/or script the database to get the latest database definitions. This can then be loaded into a new database using BCP.EXE or BULK INSERT. Be aware that the extracted data may not be complete.
File Group Utilization Alarm
The File Group Utilization alarm becomes active when a non-fixed size data file (that belongs to the file group) in any database is in danger of running out of space to grow.
This alarm is invoked whenever the space utilization percentage of a specific file group exceeds a predefined threshold value.
To resolve the data file growth limitation issue:
Resolve this issue by freeing up disk space on the disk on which the file resides.
Example
The File Group Utilization alarm is raised when the following scenario takes place:
Log Flush Wait Time Alarm
The Log Flush Wait Time alarm becomes active when the duration of the last log flush for a database exceeds a threshold.
Because users make modifications to SQL Server databases, SQL Server records these changes in a memory structure called the Log Cache. Each SQL Server database has its own log cache.
When a user transaction is committed (either explicitly, by means of a COMMIT statement, or implicitly), SQL Server writes all changes from the Log Cache out to the log files on disk. This process is called a log flush. The user that issued the commit must wait until the log flush is complete before they can continue. If the log flush takes a long time, this degrades the user's response time.
Foglight for SQL Server checks the log flush wait time for the last log flush performed for each database. If a database has a slow log flush, and then has no update activity (and therefore no more log flushes) for a long time, Foglight for SQL Server continues to report this as an alarm until another log flush is performed for that database.
To handle this alarm:
Disk Queue Length Alarm
The Disk Queue Length alarm becomes active when the disk queue length of any disk exceeds a threshold. Sustained high disk queue length may indicate a disk subsystem bottleneck, and usually results in degraded I/O times.
Disk queue length is a Windows-based metric. Therefore, occurrence of the Disk Queue Length alarm does not necessarily indicate a problem with the SQL Server instance, and can be the result of I/O operations carried out by non-SQL Server processes. Nevertheless, SQL Server, as well as any other application running on the computer for which this alarm is raised, is affected by slower disk throughput.
To handle the Disk Queue Length alarm: