Title: NDMP backup fails with "job manager died" or "Child Process died"
NV Version: 712 and above
OS Version: all
This article is about how more NDMP jobs with multiple Sub jobs all run at the same time can cause Job manager and Child Processes to die. How to test to see if this a heap size problem and if it is how to edit the registry to resolve the problem on a permanent basis.
There are NDMP jobs with multiple sub jobs scheduled at the same time and we see them all fail with "job manager died" or "Child Process died". Below are some examples of how this is seen in the binary log
Extract from binary logs
Information 2007/06/03 19:00:00 1122 Schedule ADMT Starting phase 1 on ADMT
Job Message 2007/06/03 19:00:00 1122 Jobs ADMT Starting job 1122 'Netapp 2 ndmp daily' (Phase 1 , Instance 8) for UID 0
Information 2007/06/03 19:00:03 1122 Data Plugin ADMT Backup '/vol/cognos'
Information 2007/06/03 19:00:03 1122 Data Plugin ADMT Backup '/vol/boeing'
Information 2007/06/03 19:00:03 1122 Data Plugin ADMT Backup '/vol/Vmware'
Information 2007/06/03 19:00:03 1122 Data Plugin ADMT Backup '/vol/Temp'
Information 2007/06/03 19:00:03 1122 Data Plugin ADMT Backup '/vol/TSProfiles'
Information 2007/06/03 19:00:03 0 GUI ADMT Job 1394 'Netapp 2 ndmp daily /vol/cognos' Submitted
Information 2007/06/03 19:00:03 0 GUI ADMT Job 1395 'Netapp 2 ndmp daily /vol/boeing' Submitted
Information 2007/06/03 19:00:03 0 GUI ADMT Job 1396 'Netapp 2 ndmp daily /vol/Vmware' Submitted
Information 2007/06/03 19:00:03 0 GUI ADMT Job 1397 'Netapp 2 ndmp daily /vol/Temp' Submitted
Information 2007/06/03 19:00:03 0 GUI ADMT Job 1398 'Netapp 2 ndmp daily /vol/TSProfiles' Submitted
In 7.1.1 and earlier, the Netvault service was given access to a heap of 3MB by virtue of the service being allowed to 'interact with the desktop'. This was identified by Micrososft as a potential security risk, so from version 7.1.2 this option is no longer enabled during installation leaving the service with a default heap size of only 512kB.
Therefore the heap size needs to be increased to permit additional jobs to execute simultaneously. This involves editing the Windows registry and increasing the heap available to 'non-interactive' services.
N.B. This is a system-wide setting
The confirmation that this is the error can be achieved very simply. Edit the Netvault Process Manager service properties, and on the "Log On" tab allow the service to interact with the desktop. If the errors disappear then untick this box and perform the following long-term workaround
Using a registry editor navigate to;
\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems And modify the contents of the windows key data, specifically; SharedSection=1024,3072,512 The 2nd component is the kB allocated for 'interactive' services and the 3rd for 'non-interactive'. Increasing this last number to 2048 has been found to allow up to 60 jobs to run concurrently.
The consequences of setting this value too high is that Windows allows a maximum of 48MB across these services, so if in excess of 20 'non-interactive' services attempt to run, they would have used ~10MB (~20%) and now use ~40MB (~90%) of the total resource available.
Excess services may die silently because Windows has exhausted its non-interactive heap Therefore the workarounds are
#1 Modify job scheduling
You can spread out the jobs so that not so many are queueing at one time, for example by running the quicker jobs first. These finish quickly, so that when the slower jobs are initiated there are less jobs queuing.
Scheduling jobs later should not be too much of a disadvantage, since they are not running but queuing
#2 Registry Edit
The memory available to non-interactive services system wide can be modified as described above.
#3 Interact with Desktop
Enabling this changes both the heap size available to services, but also grants additional privileges. The combination of networked clients and local privileges can represent a security risk
#4 Netvault on Linux
This limitation is Windows only, the Netvault server could be run on linux