Foglight for SQL Server (Cartridge) 5.9.5.10

The Database Unavailable alarm becomes active when Foglight for SQL Server detects that a SQL Server database is not available for reading. Users attempting to access an unavailable database receive an error message.

This alarm detects unusual database statuses, including Suspect, Offline, Recovering, Loading, Restoring, Emergency Mode, and others.

When this alarm occurs, you should:

•

Determine which databases are unavailable. Check the Databases table on the Databases drilldown. The Status column shows which databases are unavailable.

•

Take the action specified below for each unavailable database.

Some of the more common unavailable statuses are detailed in the following sections:

•

•

•

•

Setting databases offline can only be carried out manually, using the sp_dboption procedure. If any databases are Offline, consider using sp_dboption or ALTER DATABASE to bring the database online again.

Loading or restoring

Databases marked as Loading or Restoring are currently being restored by a RESTORE DATABASE or RESTORE LOG command. The database cannot be accessed by anyone while these commands are executed.

This status is also assigned to databases that have been restored using the NORECOVERY option. Specifying this parameter on a RESTORE statement notifies SQL Server that additional transaction logs need to be restored, and that no access to the database is permitted until these transactions are executed.

Check the Sessions panel on the SQL Activity drilldown for active sessions that are processing a RESTORE command (where the Last Command column contains Restore). If no sessions are processing a RESTORE command, the most likely reason for the database’s unavailability is that the last restore was carried out using the NORECOVERY keyword.

Removing the Loading/Restoring status requires completing the RESTORE process. This can involve either waiting for the active RESTORE command to complete, or restoring the remaining transaction logs. The last transaction log should be restored without the NORECOVERY keyword. If the database is mirrored, a Restoring status is shown on the mirror.

Recovering

Databases are Recovering (or InRecovery) for a while when SQL Server is restarted, or the database is first set online. This is the status SQL Server uses for indicating that it is re-applying committed transactions, or removing uncommitted transactions after a SQL Server failure.

Normally, re-applying these transactions should take only a short time; however, if any long-running transactions were open when SQL Server ended abnormally, this procedure can take an extended period.

In some cases, it is advisable to bypass the SQL Server recovery process. For example, it would make much more sense to skip a lengthy recovery process when planning to drop the database as soon as the recovery process completes. For details on skipping the recovery process, see Bypassing SQL Server recovery .


	CAUTION: Bypassing the recovery process can corrupt the database.

Suspect

Databases can be Suspect if they fail SQL Server's automatic recovery. This status most commonly appears after a SQL Server restart, when the automatic recovery process carried out during restart has failed. Databases can also be marked as Suspect when serious database corruption is detected.

The first measure that should be taken when a Suspect database is detected is to check the SQL Server error log, and look for error messages indicating recovery failure or database corruption. These messages should indicate the problem’s cause.

To correct a suspect database, consider taking the following measures:

•

Checking the SQL Server error log to determine why the database was made suspect.

•

Ensuring that all database files are available. If any database file is unavailable when SQL Server attempts to open the database, the database is made suspect. Such a scenario can take place if a database file has been deleted or renamed while a SQL Server was down. It can also happen if another Windows process, such as Backup or Virus Scanning software, is using a database file when SQL Server tries to open it.

In such a case, follow this procedure:

Wait for the database file to become available again.

Use the sp_resetstatus stored procedure (documented in Microsoft SQL Server’s Books Online) to reset the database status.

Restart SQL Server to initiate recovery.

•

If the Suspect status was caused by a full disk during recovery, free up disk space and use the sp_resetstatus stored procedure (documented in Microsoft SQL Server’s Books Online) to reset the database status. SQL Server should then be restarted to initiate recovery.

•

If the Suspect status was caused by a full disk during recovery, and it is not possible to free up space on existing database disks, add a new data or log file on a different disk that has free space available.

•

Restore the database from the last full database backup, and then restore all transaction log backups taken since that point.

In most cases, a suspect database is best handled by restoring the database from the last good full database backup and transaction logs.

Using emergency mode

Emergency mode is a special status, which can be set on an individual database, thereby causing SQL Server to skip recovery for this specific database. In some cases, taking this measure can make the corrupt database available in order to extract data that cannot be retrieved in any other way.

Activating emergency mode causes SQL Server to skip the recovery of this database, thereby preventing the database being made suspect. However, the database may contain partially-complete transactions, and there may be inconsistencies between data and indexes (logical and physical corruptions). Do not carry out any database changes or updates when SQL Server is started in this way. Emergency Mode is documented at: http://support.microsoft.com/support/kb/articles/Q165/9/18.ASP.

Bypassing SQL Server recovery

Another high risk option to access a suspect database is to start SQL Server with Trace Flag 3608. This trace flag causes SQL Server to skip its automatic recovery process on ALL DATABASES when it starts. Again, this procedure may be sufficient for extracting data that cannot be retrieved in any other way.

•

Use the sp_resetstatus stored procedure (documented in Microsoft SQL Server’s Books Online) to reset the database status of any Suspect databases.

•

Stop SQL Server, and then start it from a command line with Trace Flag 3608 and minimal startup (sqlservr.exe -f -c -T3608). This setting causes SQL Server to skip its automatic recovery at startup, thereby preventing the database from being made suspect. However, the database may contain partially-complete transactions, and there may be inconsistencies between data and indexes (logical and physical corruptions). Do not carry out any database changes or updates when SQL Server is started in this way.

With both Emergency Mode and Bypassing SQL Server Recovery, you may then be able to extract your data using BCP.EXE and/or script the database to get the latest database definitions. This can then be loaded into a new database using BCP.EXE or BULK INSERT. Be aware that the extracted data may not be complete.

File Group Utilization Alarm

The File Group Utilization alarm becomes active when a non-fixed size data file (that belongs to the file group) in any database is in danger of running out of space to grow.

This alarm is invoked whenever the space utilization percentage of a specific file group exceeds a predefined threshold value.

To resolve the data file growth limitation issue:

Under the Databases drilldown, click the Data Files panel.

Check for files with AutoGrow=Yes; files in danger of filling will have a low free percentage value (displayed on the Free Pct column).

Resolve this issue by freeing up disk space on the disk on which the file resides.

Example

The File Group Utilization alarm is raised when the following scenario takes place:

•

All data files in the file group are 95% full, all these files have been configured with the AutoGrow option set to =Yes and, given the current growth increment, the data files have a limited number of growths remaining before all available disk space is consumed.

Log Flush Wait Time Alarm

The Log Flush Wait Time alarm becomes active when the duration of the last log flush for a database exceeds a threshold.

Because users make modifications to SQL Server databases, SQL Server records these changes in a memory structure called the Log Cache. Each SQL Server database has its own log cache.

When a user transaction is committed (either explicitly, by means of a COMMIT statement, or implicitly), SQL Server writes all changes from the Log Cache out to the log files on disk. This process is called a log flush. The user that issued the commit must wait until the log flush is complete before they can continue. If the log flush takes a long time, this degrades the user's response time.

Foglight for SQL Server checks the log flush wait time for the last log flush performed for each database. If a database has a slow log flush, and then has no update activity (and therefore no more log flushes) for a long time, Foglight for SQL Server continues to report this as an alarm until another log flush is performed for that database.

To handle this alarm:

•

On the Databases drilldown, select the Summary panel to review the Log Flush Wait Time counter in the Database History graph. The database with the high graph values is the one experiencing the problem. If a database has a consistently high value that never changes, run SQL command CHECKPOINT on that database to force another log flush and check the value in Foglight for SQL Server again.

•

Select the Transaction Logs panel on the Databases drilldown to find the disks on which the log for this database resides.

•

Consider moving the log files to disks that support fast write activity (for example, a fast RAID controller with write-back caching enabled).

•

Consider moving log files off RAID-5 devices as these are optimized for read activity, and log files generate mainly write activity.

Disk Queue Length Alarm

The Disk Queue Length alarm becomes active when the disk queue length of any disk exceeds a threshold. Sustained high disk queue length may indicate a disk subsystem bottleneck, and usually results in degraded I/O times.

Disk queue length is a Windows-based metric. Therefore, occurrence of the Disk Queue Length alarm does not necessarily indicate a problem with the SQL Server instance, and can be the result of I/O operations carried out by non-SQL Server processes. Nevertheless, SQL Server, as well as any other application running on the computer for which this alarm is raised, is affected by slower disk throughput.

To handle the Disk Queue Length alarm:

•

On the SQL Activity drilldown, click the SQL I/O Activity panel and look at the SQL Server Physical I/O chart, to view whether SQL Server is generating high amounts of disk activity. This chart displays the rate (I/O per second) for each type of I/O that SQL Server is performing. If SQL Server is not generating a lot of I/O activity, the high disk queue length is most likely being caused by some other Windows process, or by Windows itself.

•

On the SQL Activity drilldown, click the Sessions panel to see which SQL Server processes are executing at the time the alarm was raised, and the SQL currently being executed.

•

Consider moving database files to faster disks. If you are not using hardware RAID, consider purchasing a RAID subsystem. If you are using RAID-5 for write-intensive files (such as Database Logs or heavily updated database files), consider moving to a faster RAID implementation (RAID-0 or RAID-10).

•

In some cases, you can speed up all disk I/O by reviewing the RAID options on your RAID controllers. One example is to enable disk-write caching, as long as your disk subsystem is protected by battery backups or UPS.

•

On the SQL Activity drilldown, click the SQL I/O Activity panel and look at the SQL Server Physical I/O chart, to view the Checkpoint statistic. If the Checkpoint process is generating a lot of I/O, review the Recovery Interval setting in the Configuration drilldown.

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

Foglight for SQL Server (Cartridge) 5.9.5.10 - User Guide

Alarms Displayed in the Database Details Panel

File Group Utilization Alarm

Log Flush Wait Time Alarm

Disk Queue Length Alarm