Symptoms:
All multiple datacopy, duplicate or consolidate jobs are hung in a state of "waiting for media" and there is no activity on any of the tape drives and no job is in a state of writing to media.
Description:
All multiple datacopy, duplicate or consolidate jobs are hung in a state of "waiting for media". For these jobs to be able to execute, they require media/drive combinations to be fulfilled.
The cause of the situation is that there is a media request deadlock because:
Job 'A' has assigned the target media and drive and was waiting for the source media XXXX to become available,
Job 'B' had assigned source media XXXX and was waiting for the same target drive to become available.
In this situation neither of them is going to progress and this is causing a block. The way to resolve such a blockage is abort one of the jobs that is causing lock and restart once the processing is underway.
Here is a troubleshooting guide to allow you to identify the 'problematic' job that needs aborting:
Step [1] - determine what media/drive each job was waiting on
by performing a diagnose on the relevant media requests within the device manager.
Note down the details of what tapes and drives are locked by that task/job and what tape or drive is required - but are 'locked by another task'.
This is information is made available within the media diagnose GUI.
Step [2] - establish which job was locking the required media
- Enter Media Manager
- Select the required tape in question XXXXX
- Select status
- Select the batch status tab from the resultant display
This displays the job ID that was assigned the media.
Step [3] - determine what jobs were locking the drives
- Enter the device manager
- Select the relevant drive (as detailed within the media diagnose details)
- Select the status option
- Select the batch status tab from the resultant display
This displays the job ID that was assigned the drive
Having listed all this information, you should be able to determine the mutually exclusive media requirements for the two jobs that are causing the bottleneck and choose the one to abort.
There are two main potential causes of these deadlocks - conflict for target media & conflict for source media.
#1 - Target media conflict -
Two duplicate/consolidate/datacopy jobs can both lock a tape drive.
If the library only has 2 drives, this can cause both jobs to 'grab' a drive, then meaning that there are no free drives for the source media.
This causes both jobs to then wait for the source media, resulting in other jobs that require the drives then being held queuing until the duplicate/consolidate/datacopy jobs are aborted.
Workaround =
Ensure that the target media set for the duplicate/consolidate/datacopy jobs specifies only one drive as the target. This ensures that the duplicate jobs will only use one drive, always ensuring that one drive is free for the source media.
#2 - Source media conflict -
Two duplicate/consolidate/datacopy jobs requiring the same source media because seperate backup jobs wrote their savesets(now being targetted for restore/copy) to the same tape.
Workaround =
Ensure that the initial Phase 1 backup jobs use seperate tapes on a per job basis and do not share tapes - e.g. only one backup job's data will be housed on a tape at a time.
This will require that there are enough physical or virtual tapes to service each individual backup, if physical tapes are being used then additional tapes may be required to provide this.
If a virtual library is in use for the initial backup then this may need to be reconfigured and resized to provide distinct tapes for each backup run.
Once the tape resource has been fufilled then a modification is required of the initial backup job's options within the target set - enable the 'Ensure this backup is the first on the media' option.
This means that tapes will not be shared between jobs and prevent the conflict and the deadlock.
© 2021 Quest Software Inc. ALL RIGHTS RESERVED. Feedback Terms of Use Privacy