SnapMirror to Tape backup to a device attached to a Smart Client fails with a 'Failed to start backup: Unknown error -1' error.
説明
The following messages and errors appear within the NetVault logs:
1] Information 2014/11/12 07:55:44 57812 Schedule netvault0 Starting phase 1 on netvault0 2] Job Message 2014/11/12 07:55:45 57812 Data Plugin netvault0 Host 'op-fsos1-pp', ID '0151705088', OS 'NetApp', Version 'NetApp Release 8.2P4 7-Mode' 3] Job Message 2014/11/12 07:55:45 57812 Data Plugin netvault0 NDMP SnapMirror to Tape 7.6.116 Backup 4] Background 2014/11/12 07:55:45 57812 Data Plugin netvault0 Filer advises DATA_BLOCK_SIZE is 240 KB 5] Information 2014/11/12 07:55:45 57812 Data Plugin netvault0 Block size requested is 240 KB 6] Job Message 2014/11/12 10:11:23 57812 Media netvault0 (gmh-backup-gmh: SL_DEC90302EY (HP MSL G3 Series)) Media in 'DRIVE 1:gmh-backup-gmh' assigned to job ready for data transfer 7] Background 2014/11/12 10:12:26 57812 Data Plugin netvault0 NDMP Backup Environment Variable 'FILESYSTEM' = '/vol/fsosesx1vp_fc0' 8] Background 2014/11/12 10:12:26 57812 Data Plugin netvault0 NDMP Backup Environment Variable 'DIRECT' = 'N' 9] Job Message 2014/11/12 10:12:26 57812 Data Plugin netvault0 Backup Request
With sub text: Backup Environment: SMTAPE_DELETE_SNAPSHOT = Y FILESYSTEM = /vol/fsosesx1vp_fc0 DIRECT = N
10] Information 2014/11/12 10:12:26 57812 Data Plugin netvault0 NDMP version is 4 11] Error 2014/11/12 10:27:58 57812 Data Plugin netvault0 Failed to start backup: Unknown error -1
With sub text: NET 15 110 102758 Fatal error during TCP receive: Connection timed out XDR 158 -1 102758 Failed to receiving XDR data NDMP 5 2 102758 Error on Ndmp Channel NDMP 89 0 102758 Ndmp channel has disconnected MSGWAIT 37 0 102758 Desired source has gone down NDMP 1715 0 102758 Failed to receive reply 40a, sequence 15
12] Error 2014/11/12 10:27:58 57812 Media gmh-backup-gmh gmh-backup-gmh SL_HU19014K07 (HP Ultrium 4-SCSI): had transfer aborted
As you can see from the sub text error, that this points to a network issue outside of NVBU.
The following NVBU environment details are…
NetVault Server: 9.2 installed on SuSE 11. NetVault Client: 9.2 installed on SuSE 11 with the tape library attached here i.e. a Smart Client.
So the backup channel is from the NetApp Filer > NetVault Server with SMTT Plugin > Smart Client > Backup device i.e. the backup job has to communicate down a very long channel. Ideally the Filer should have the tape library attached there via FC and setup with shared drives to all Clients for a more faster and reliable backup channel.
対策
This issue could be due to faulty network hardware from network cards, switches, routers, cables etc… and/or network bottlenecks and would need investigating. Additionally need to check the following items as well:
A] Is there a Firewall between the NetVault Server and Smart Client i.e. has the NetVault Server, Client and Firewall been setup correctly? B] Does this backup have the Network Compression in Advanced Options enabled? If yes, disable this. C] Make sure that the following settings are set on the NetVault Server and Smart Client (the rest of the Clients can also be amended) to help with the backup performance and network reliability:
1] Windows NetVault Client Process Manager Service under the Windows Services – Logon tab (if any Windows machines) - “Interact with desktop” ticked. If you upgrade the NetVault Software in the future, then the “Interact with desktop” selection would have disappeared and will need re-selecting again.
2] Network settings to improve network connections whilst a backup maybe idle for a while:
Go to the “Network Manager” tab within the Configurator and change “Time in seconds to complete a remote connection” from 30 to 60. Then change "Time wait before dropping inactive connection" from 300 to 600. Also change “Keep Alive Rate” from 15 to 7. And the last two: “Time between availability broadcasts” from 300 to 600. “Time between security broadcasts” from 300 to 600.
NetVault Configurator setup for NetVault Server ONLY: “Job Manager” tab, set "Job keep alive rate" to 1.
3] NetVault Server and Client Configurator – Process Manager tab – Shared Memory for Process Table – set to 128000. On Linux machines you will have to enter 128000 and then stop and restart the NetVault Services in the Configurator. If the Services don’t start, then the value is too high and will have to half the value every time until the NetVault services do start again.
Once all the settings have been entered, then you will have to stop and start the NetVault Services on the NetVault Server and Clients.
4] Please amend any AntiVirus Software on any Windows NetVault Clients, so as to not scan any nv* processes and BakBone Software and Quest Software folders. As this will impact on the backup reliability, performance and network communication.
5] Make sure that Linux IPTables is disabled (Firewall) on the NetVault Server and Clients (you may have to uninstall IPTables as well).
6] Make sure that the Windows Firewall is disabled on all Windows NetVault machines (if any).
D] You need to be able to perform the following tests, to make sure that NVBU can communicate freely amongst NetVault Server, all Clients and backup devices, otherwise causes backup/media/device/communication issues:
a] From Server: ping IP Address of NetVault Client, nslookup of NetVault Client IP Address and NetVault Client Host Name. b] From Client: ping IP Address of NetVault Server, nslookup of NetVault Server IP Address and NetVault Server Host Name. c] From Server CLI: telnet [IP Address from Client] 20031 d] From Client CLI: telnet [IP Address from Server] 20031
Note: If there is no DNS Server setup to resolve names, then you will have to amend the …/etc/hosts file on each machine to include its own IP Address and Name, as well as all other NetVault Clients too.