There are instances where disk errors are detected on a volume where an AppAssure/Rapid Recovery Archive or Repository lives on. A normal reaction to disk error events is to run a disk checking utility like chkdsk or scandisk to determine what sectors are damaged. Let’s assume that your run chkdsk in READONLY mode and discovers a bad sector.
If you run chkdsk /f /r on a failed archive volume, chkdsk may result in data loss for the recovery point chain(s) included in the archive itself. Running chkdsk /f /r on a failed repository volume may result in repository corruption.
What is a bad sector?
A sector is the smallest storage area on a disk that a file system can read/write information. Over time, sectors can go bad from physical damage or from software damage. Physical damage to the disk surface is generally not repairable and the Operating System can skip over marked “bad” sectors and not reference them in the future. Software damage can result from a power failure during writing data to a sector to more commonly experienced virus and malware infections causing bad sectors to develop.
When a bad sector develops on a volume, there is a potential for data corruption on that volume. Remember, a bad sector cannot be repaired by the Operating System but it can be marked as unusable. Once marked as unusable, the Operating System will know not to attempt to store data in that bad sector. Reads and writes to the “bad” sector will be reallocated to another sector on the disk. The storage capacity of the disk will be decreased by the amount of storage space in the bad sector.
If your hard drive develops a bad sector, back the hard drive up immediately. If the bad sector was caused by a faulty drive head, the problem can quickly spread to other sectors on the disk and impact a larger number of files as well.
Can a bad sector be repaired?
A common utility to fix bad sectors is Microsoft chkdsk. Chkdsk when used with no arguments is run in Read-Only mode to detect any bad sectors. Two common arguments used to repair bad sectors are /f and /r. Let’s take a moment and review what these arguments do:
chkdsk / f looks for any disk errors and attempts to “fix” bad blocks found on the disk. Fixing bad blocks is done by flushing Zeros to the block. This can result in data loss if any data was in fact written to these blocks.
chkdsk /r (also will run /f if called alone) looks for and identifies bad sectors on a disk. Should a bad sector be found, any readable information is moved to another sector in an effort to recover the data on that sector. As there is no guarantee that all the data in the sector was recovered and successfully moved to another sector, you can still suffer data loss.
What happens when chkdsk /f /r is run on an AppAssure/Rapid Recovery Archive or Repository Volume?
Let’s assume that we have an archive that we are trying to import and it is failing. After reviewing the Windows Logs, we see that there are disk errors on the Archive volume and you decide to run chkdsk /F /R. What can happen?
Well, chkdsk /F /R will look for bad blocks and bad sectors on the disk. If bad blocks are detected, Zeros are flushed to each bad block rendering it usable again. Note: whatever data was originally in the block is lost and is not recoverable. Chkdsk /F /R will also look for bad sectors on the disk. If any bad sectors are found, any readable data is moved to another usable sector on the disk before the original sector is repaired.
What does this mean for AppAssure/Rapid Recovery data on the repaired volume?
There really is no way to know for certain what blocks/sectors identified as bad. Moreover, it is even more challenging to say for certain if AppAssure/Rapid Recovery archive data lives in a bad block or sector. The best case scenario is the disk error is detected in an unused section of the disk and will not impact the current archive.
The worst case scenario is the disk error is detected in a used section by the archive. In this case, bad sectors are repaired and zeroes are flushed to bad blocks. Should you be able to import an Archive after running chkdsk /F /R in this instance, incremental recovery points or even base images will be lost for the imported Agent. There is no way to recover this information and may prove costly depending on the age and nature of this archive.
So what happens if chkdsk /F /R is run on a repository volume? Chkdsk /F /R will still run in the fashion described above, moving data from sector to sector and flushing bad blocks with Zeroes. There is a concern for the types of repository files however. The DFS.Records file is the repository database itself and any chkdsk changes here can introduce corruption into the repository. Likewise, if any chkdsk changes are made to the repository metadata, we can introduce Null values into the metadata which then will lead to corruption.
Chkdsk /F /R can also determine the Sector 1 is bad and move the data to Sector 9. When the repository goes to mount the base image, the metadata (which was not modified by chkdsk) is still going to call Sector 1 as that is where it thinks the data value $ lives. Chkdsk /F /R does not communicate disk changes to the repository metadata to update storage locations. When the repository reads an unexpected or Null value, DVM will throw an error (this can also be a form of repository corruption).
Do not run chkdsk /f /r on an archive or repository volume.
COPY either the archive or repository to another healthy disk (remember use a program like ROBOCOPY or TeraCopy for the process). Once the data copy job is complete, try to import the archive again from the healthy disk. For a repository, attempt to reattach the repository (you may need to back up the repository registry key first and delete it prior to “open existing repository”).
Attempt to replicate the repository to another Core and move protection there while repairing the original machine.