The following symptoms are observed:
Multiple virtual machines in the user's environment are affected with snapshots not committing at the end of a job
VMs may freeze at the start and at the end of a backup job.
1. Open a putty session to the host where the VM in question resides
2. run "esxtop" command
3. press 'd' for disk
4. Observe the value of the DAVG/cmd
DAVG/cmd = This is the average response time in milliseconds per command being sent to the device. This latency is seen at the device driver level. It includes the roundtrip time between the HBA(Host bus adapter) and the storage.
If the user's environment is on a SAN connected Storage (Fibre Channel or iSCSI)
a. Choose a high I/O VM to test in the user's environment
b. In vCenter under Snapshot Manager of the VM, Create a non memory test snapshot (enable quiescing if this is enabled in the backup job in question)
c. leave this snapshot open for the length of time it takes to complete the VM backup
d. After this period of time, run steps 1 -4 outlined above
e. In vCenter under Snapshot Manager, delete the test snapshot created and observe the value of DAVG/cmd.
DAVG is a good indicator of performance of the backend storage.
If this value is over 15ms/CMD and the response time consistently stays at this level and higher, then this gives an indication that there is an issue on the SAN, either on the storage array or on the FC switches.
User needs to investigate issues in the SAN, adding disk spindles , changing the RAID level or checking port configuration and performance statistics on the FC switches may help in such cases.
If the response time stays at this level, these events may be logged; I/O deterioration, abort messages and other SCSI errors can be reviewed in these logs on the host:
ESX 3.5 and 4.x – /var/log/vmkernel
ESXi 3.5 and 4.x – /var/log/messages
ESXi 5.x - /var/log/vmkernel.log