As a precaution, always attempt to stop the Core service gracefully. This means:
- stop and cancel all running tasks whenever possible. There's a PowerShell cmdlet that allows to stop all active tasks if Web UI is not responding for any reason: stop-activejobs -all. Repeat this command multiple times if needed
- if no active tasks are running, but Core service is failing to stop for long time gracefully, please open Resource Monitor (available from the Task Manager, Performance tab), switch to the Disk tab. Then, find and mark the Core service in the list of processes with disk activity. This will filter out other processes and will help to get a clearer view on Core service disk activities in the lower pane.
- if no Core service listed in the Resource Monitor, tasks with disk activity for long time, then it's safe to terminate the Core service forcefully
Note: stopping the Core service though the Services may take longer than a standard time and may trigger the warning: "The service did not respond in a timely fashion". This is a normal message. Core service performs multiple disk-intensive tasks when starts and stops, and this may trigger Services' timeout.
Workarounds to resolve the repository corruption:
1) Run Repository Repair Tool (RRT). This is the most sure way to fix the repository corruption but it's also a most time-consuming one. During all that time, the Core machine will be not available to take backups. Depending on the size of the repository, it may take weeks and even months to go through all RRT steps. RRT process should be run twice, requires about 2% local disk size available from the repository size and may take about 5-10 and even up to 15 hours per terabyte of repository storage (depending on the hardware and storage speed).
Please contact Support for detailed instructions about the RRT process.
2) Requires same amount of free storage as the current repository.
- install a temporary Core machine with the same Core version. Note: the DL appliance version 18.104.22.168 is matching software core 22.214.171.124, 126.96.36.1993 matching software 188.8.131.525
- on the original Core, stop the Core service
- copy a whole repository folders (data and metadata) to another location
- attach this new location to the temporary Core machine
- run RRT on the temporary Core machine until finished
- at the end, archive the time range for all agents on the source Core and import archives to the temporary Core with the fixed repository
- copy the fixed repository back (stopping Core services every time)
This method allowing the current Core to continue to function, while the temporary Core is working on fixing. Please note that some maintenance tasks, like rollups, may fail when the damaged block is found.
3) Also requires enough additional storage, but it's a fastest way so far without losing all the backed up data.
- archive all agents one by one to the separate storage location. Please note that some archives may fail when corruption will be encountered. Repository structure is a block-based, and same corrupted block may be referred to by different agents' Recovery Points, making them unavailable. It's recommended to archive each agent to a separate archive folder.
- destroy and recreate the repository from scratch
- import all successful archives back
The impact of the corrupted repository can be somewhat alleviated if replication was setup long time ago and data was replicated before the corruption occurred.