Contingency Planning Overview
Looking at any InTrust organization infrastructure, it is possible to determine the components which are the most critical for the InTrust operation. In case these components are damaged due to any kind of disaster, the whole system will fail, and valuable data will be irrevocably lost. So, it is strongly recommended that you back up the following:
- InTrust Servers
- Configuration Database
Generally, Audit database failure is not as critical as other components’ failures, because typical workflow presumes that data is collected to repository. Repository backup will help to restore your Audit database: after you recover the repository, you can easily import the necessary events into the database. However, it is recommended that you periodically back up your Audit database and other InTrust components, as described in this guide.
Backup Procedures for InTrust
To minimize the risk of irrevocable data loss, it is strongly recommended that you perform backup procedures for your InTrust components, as follows:
- InTrust Servers: either weekly, or after new agents are added.
- Configuration database: always after any configuration changes; periodically to take into account newly installed agents (daily backup recommended). Alternatively, set up configuration database replication, as described in Replication of the InTrust Configuration Database. This lets you ensure InTrust configuration consistency across the enterprise and increase you InTrust organization's fault tolerance.
- Repository: after each gathering process, i.e. depending on gathering process schedule; at least daily backup recommended.
- Audit database: depending on gathering process schedule (frequency), daily backup recommended.
- Alert database: daily backup recommended.
- InTrust agents: recommended—two times a week.
InTrust provides InTrust Server failover capabilities, which allow for automatic operation switching. It is recommended to activate this feature, as follows:
- Configure two InTrust Servers in your InTrust organization:
- A production InTrust Server that performs gathering and real-time monitoring
- A standby InTrust Server that will take over the operation if a production Server goes down.
- Create an InTrust site containing the standby InTrust Server, and specify this server name when prompted for InTrust Server responsible for processing the site.
- To monitor for the state of production InTrust Server, you need to enable the “InTrust server is down” monitoring rule (located in InTrust Internal Events | InTrust server failover rule group) on the standby server, and activate the response action (failover script execution) of this rule.
- When configuring this rule, select to perform matching on server side. Also, you can specify:
- Which InTrust Servers to monitor
- How long to wait for response from a monitored server before it is considered to be down
- Create and activate a monitoring policy involving this rule and the InTrust site created on step 2.
If the production InTrust Server failure occurs, the standby InTrust Server takes over the sites and tasks processed by InTrust server that went down.
Caution: To ensure the availability and integrity of InTrust databases and repositories, it is recommended to locate them separately from the InTrust Servers. This will help minimize the risk of their failure if any of the InTrust servers go down.
If your agents are planned to be installed manually (for example, automatic agent install is not allowed by your organization's policies), then you should establish agent-server communication for both the production and standby InTrust servers when you install and configure the agents. This will allow agents to connect to a standby server if a failover occurs. (For details, refer to Installing Agents Manually).
How to Recover Your InTrust
The following topics give you an idea of the problems which may occur due to a disaster, how they can be solved if you have properly backed up your data, and what if you have not.
InTrust Server Recovery
InTrust Server and Its Temporary Files Corrupted due to Disk Failure
|Disk backup available
Restore InTrust Server and temporary files to the location where they resided.
Use InTrust failover capability to switch to other InTrust server in your organization. For that, you should enable the “InTrust server is down” real-time monitoring rule (from the “InTrust server failover” rule group) on the standby InTrust server to monitor for current InTrust server status:
- On the General tab of the rule’s Properties dialog, make sure the rule is enabled.
- On the Response Actions tab, make sure the Failover script execution is selected. Save the settings, and commit the changes.
If a failure occurs, you will get a notification, and standby server will take over the sites and tasks processed by InTrust server that went down.
You can perform a failover manually by launching Server Switching Wizard:
- In InTrust Manager, select Configuration | InTrust Servers, and from your current InTrust server’s shortcut menu, select Failover | Switch. Follow the steps of the wizard:
- Select the InTrust sites and jobs to be switched.
- Specify the InTrust server that will take over the operations.
- Finish the wizard and commit the changes.
After restoring the InTrust server, you can roll back this switching session (switch sites and jobs back to the server initially responsible for their processing):
- Start the Rollback Wizard by selecting Failover | Roll Back from the restored server’s shortcut menu, and select the session to roll back.
- Commit the changes after finishing the wizard.
Notes and Caveats
- If you are using role-based administration in your InTrust deployment, consider that to run Server Switching wizard, a user must have Modify permission for switched sites and jobs (their nodes in InTrust Manager), and for the InTrust Server node (the one you are switching from)
- By default, passwords for agent-server connection expire in three days after they were set. Thus, if you make a daily backup of InTrust program folder, and you restore it on the new server within 3 days timeframe, the agents should be still able to connect to server. Agent password expiration policy can be adjusted in the configuration database.
- If an InTrust Server that went down was hosting any Data Stores used by the jobs which were running at that moment, then such jobs will fail, and you will have to create them anew. For example, if a gathering job was using an Audit database located on the failed server, it has to be created anew.
System Disk Failure InTrust Server Computer
|Disk backup available
Restore files from backup.
||Use InTrust failover capabilities, as described above.|
InTrust Server IP Address Changed
Details: After the server is restarted, connection with the agents is lost.
|No agents installed on the computers over the firewall.
- If an agent had been installed automatically, then it is recovered, and agent-server connection is re-established automatically after the heartbeat interval, or when gathering process starts.
- If an agent had been installed manually, and agent-server connection had been also established manually, then it is re-established automatically after the heartbeat interval, or after the gathering process starts (it is assumed that gathering is performed using agents). However, make sure the account (under which the InTrust server runs) can access the target computers—otherwise, you need to establish agent–server connection manually. For details, see Installing Agents Manually.
|Several agents installed on the computers over the firewall.
||Agent-server connection for these agents must be established manually. For more details, see Installing Agents Manually.|
Caution: After recovery, an agent tries to connect to InTrust server whose name (NetBIOS name, FQDN, or IP address) was provided to this agent during the installation procedures (that is, when the server was registered on agent).
If you have specified the FQDN (recommended), then the agent will search for the InTrust server using this name, and connect to the server automatically.
However, if the server's IP address had been specified (for example, in case of DMZ, or some DNS problems) that was later changed, you should re-register that server on the agent, as described in the Establishing a Connection with the Server topic in Installing Agents Manually.