Title: Diagnosing Netvault Scheduler Process Crash.
Date: 02/2007
NV Version: Up to NVBU 7.4.5u4
OS Version: Any
Application version:NA
Plugin version:NA
Description: Instructions for providing diagnostics for a Netvault Scheduler Process Crash.
Symptoms: Netvault Scheduler process, nvshed crashes, binary log shows:
Error 2007/02/21 11:20:54 102 Jobs nv_kerri Scheduler exited unexpectedly, abandoning job
A likely cause of a sdcheduler crash is a corrupt Netvault schedule database, however in cases like these it is impossible for us to diagnose corruption on the schedule database in isolation. The 7.4.5 release did fix a problem with nvpmg.exe crashing but this was not related the the scheduler crashing.
Usually one of two things causes a Netvault process to crash
a) illegal access caused by a bad pointer OR
b) attempt to use memory that has not been successfully previously allocated (often because the allowed process memory limit has been reached)
In order to help Bakbone Support diagnose the problem, please perform the following: -
#1 Check the size of the Netvault schedule database directory size
The schedule database canbe found in the foloowing directories (by default)
UNIX: ~netvault/db/ScheduleDatabase/
or
Windows: C:\Program Files\BakBone Software\NetVault\db\ScheduleDatabase
Due to a lack of compaction within the Netvault schedule database it may be worth performing a restore of the schedule database through the Netvault database plugin and this will re-order and pack the underlying database records removing any 'null' entries resulting in a smaller schedule database that will cause the scheduler process to consume less memory. This in itself may resolve a Schedule Database corruption if the corruption is not extensive. Please note that restoring the Schedule Database may result in some jobs starting if their start time has passed, you will need to manually abort these jobs.
#2 Check the number of memory objects assigned to the schedule manager process over time:
- stop and restart the netvault server service
- periodically start the utility /usr/netvault/bin/nvpview
- select the schedule manager process from the process window and hit the log object button this will write an entry into the Netvault binary logs in the following format
Background 2007/01/19 16:43:23 0 Schedule nv_kerri Schedule Manager: 1021 objects, 3669 blocks of memory
Look at the the binary log and check whether the object list and memory useage is constinually growing without releasing, you will need to filter on ALL to see these messages. Please send a binary log dump of this to Bakbone Support.
#3 On a unix server, as root, send the output from ulimit -a to support - this will allow us to see the server memory limits.
#4 Finally, set circular trace on the scheduler, process manager processes this will need to be collected before netvault is restarted after the next crash - then send the resulting trace in an archive format to Bakbone Support.