Repository corruption has been detected. How to deal with the situation?
You may see the following error in the logs:
Replay.Core.Contracts.DedupeVolumeManager.DedupeVolumeOperationException depth 0: A bad record has been read. There may be a problem with the repository or the underlying storage device
System.AggregateException depth 0: One or more errors occurred. (0x80131500) ---> (Inner Exception #0 depth 0) System.AggregateException depth 1: One or more errors occurred. (0x80131500) ---> (Inner Exception #0 depth 1) System.AggregateException depth 2: One or more errors occurred. (0x80131500) ---> (Inner Exception #0 depth 2) System.InvalidOperationException depth 3: CompleteAdding may not be used concurrently with additions to the collection. (0x80131509)
Repository can become corrupted for multiple reasons. Hardware failures, power outages and service crashes can lead to disk and file structures corruptions, which may affect repository files. Also, terminating the Core service forcefully, while it was writing to the repository, will most certainly lead to the corruption.
We recommend you Contact Support to open a case to confirm Root Cause of all corruption cases.
Please Contact Support so we can confirm Root Cause of corruption. DVM bad record events may be false positive and need to be explored before recommending further action. We recommend running a full repository check and then upload logs so we can see what the check finds. If there are hardware events consistent with data loss, we will need to address those concerns before moving forward.
Run Repository Repair Tool (RRT). This is the most sure way to fix the repository corruption but it's also a most time-consuming one. During all that time, the Core machine will be not available to take backups. The Repository Recovery tool requires several repository checks and 2 to 3 utility scans to make sure all bad data blocks are identified and safely removed.
Average time needed (per scan) is 15 hours per TB of protected data (uncompressed) and can take up to 2-3 weeks to complete depending on the size of the repository.
Another option, but requires enough additional storage, without losing all the backed up data.
- Replicate data to another Core/Repository
- Archive all agents one by one to the separate storage location. Please note that some archives may fail when corruption will be encountered. Repository structure is a block-based, and same corrupted block may be referred to by different agents' Recovery Points, making them unavailable. It's recommended to archive each agent to a separate archive folder.
- Delete and recreate the repository.
- Attach/Consume archives as needed.
*WARNING* Please contact Contact Support before deleting Repository.
ONCE THE REPOSITORY IS DELETED, ALL RECOVERY POINTS ARE DELETED AND THERE'S NO WAY TO RECOVER OLD DATA.
Fastest way to resolve the corruption is to delete the repository, then recreate repository
Please note that if rollups are failing for the specific agent due to the corruption, it'll not help to delete the affected agent's Recovery Points. Corrupted blocks may affect multiple agents due to deduplication and rollup failures will return in future.