This chapter contains instructions for recovering Oracle replication along with the database and applications during a failover in a high-availability environment. To support these procedures, SharePlex must be properly configured to support high availability. See Configure replication to maintain high availability.
Recover replication if the primary system fails
Recover replication if the secondary Oracle instance fails
In an unplanned failure of the primary (source) machine, replicated data remaining in the SharePlex queues on that system will be unrecoverable as a result of buffering and the likelihood that the queues are corrupted. In a high-availability environment, you can move replication to the secondary (target) machine along with the database users to maintain data availability. When the primary system is restored, you can move users and replication back to that system with minimal downtime using a hot backup of the secondary instance.
For this procedure, you will activate a configuration file and then use the reconcile command to synchronize the results of the backup with ongoing replicated user changes after the copied instance is recovered.
Oracle database on Unix or Linux
This procedure moves replication to the secondary system after an unplanned failure of the primary system.
Verify that Export is stopped on the secondary system.
sp_ctrl> stop export
Use the qstatus command to view the post queue, and keep issuing this command until the number of backlogged messages is 0. Note: Do not wait for the actual numberof messages to be 0. If the primary system failed before the commit for some transactions were received, messages for these partial transactions will remain in the queue until cleared later in this procedure.
sp_ctrl> qstatus
This procedure moves users back to the primary machine after it is recovered from an unplanned failure. Follow each segment in the order presented.
On the primary system, start sp_cop using the -s option to prevent the SharePlex processes (Capture, Read, Export, Import, Post) from starting.
$ /productdir/bin/sp_cop -s &
On the primary system, deactivate the configuration file. When you copied the archived SharePlex variable-data directory to the primary system, you copied the configuration that was active before the system failed. This causes the Capture process to set the transaction number to “1” when replication from the primary system resumes.
sp_ctrl> deactivate config filename
On the primary system, delete the capture queue.
sp_ctrl> delete queue datasource:C
Example: sp_ctrl> delete queue o.oraA:C
On the primary system, delete the export queue.
sp_ctrl> delete queue queuename:X
Example: sp_ctrl> delete queue sysA:X
On the secondary system, delete the post queue.
sp_ctrl> delete queue queuename:P for datasource-datadest
Example: sp_ctrl> delete queue sysA:P for o.oraA-o.oraB
Note:
On the primary system, verify that all SharePlex processes are stopped.
sp_ctrl> status
On the primary system, stop Post.
sp_ctrl> stop post
On the secondary system, start Export. This establishes communication between the primary and secondary systems.:
sp_ctrl> start export
When the hot backup is finished, switch log files on the secondary system and make a note of the highest archive-log sequence number.
Open the database with the RESETLOGS option.
Note: This resets the sequences on the primary system to the top of the cache upon startup.
On the primary system, issue the reconcile command using the sequence number of the log that you noted previously. If you are using named post queues, issue the command for each one. If you do not know the queue names, issue the qstatus command first.
sp_ctrl> qstatus
sp_ctrl> reconcile queue queuename for datasource-datadest seq sequence_number
Example: reconcile queue SysB for o.oraA-o.oraA seq 1234
On the primary system, activate the configuration file.
sp_ctrl> activate config filename
On the primary system, start Post.
sp_ctrl> start post
On the primary system, view the status of the SharePlex processes.
sp_ctrl> status
Note: This command shows that Post stopped due to an error, which occurred because the object cache is missing.
On the primary system, issue the show log command with the filter option, and filter on the keyword "objcache."
sp_ctrl> show log filter=objcache
Look for a Post error message that refers to a file with a name similar to the following example:
0x0a0100c5+PP+sys4+sp_opst_mt+o.quest-o.ov-objcache_sp_opst_mt.18
You are looking for a string that contains the string “objcache_sp_opst_mt,” followed by a number. This is the object-cache file that the Post process needs. If you are using named post queues, there will be more than one error message, each referring to a different object-cache file but ending with the same number, such as the number .18 in the example.
Make a note of the full pathname of the Post object-cache file(s) named in the error message. The path will be the state directory in the SharePlex variable-data directory, for example:
splex_vardir/state/0x0a0100c5+PP+sys4+sp_opst_mt+o.quest-o.ov-objcache_sp_opst_mt.18
On the secondary system, shut down SharePlex.
sp_ctrl> shutdown
On the secondary system, change directories to the state sub-directory of the SharePlex variable-data directory and locate the Capture object-cache file. This file will have a name similar to the one in the following example.
o.quest-objcache_sp_ocap.18
Important! If there are more than one of these files, use the one with the most recent number at the end of it — this number should match the number at the end of the Post object-cache file, such as .18 in the example.
On the primary system, start Post.
sp_ctrl> start post
On the primary system, view the post queue and continue to issue the command until the number of messages is 0.
sp_ctrl> qstatus
On the secondary system, shut down the Oracle instance.
svrmgr1> shutdown
On the secondary system, start the Oracle instance.
svrmgr1> startup
Note: This resets the sequence on the secondary system to the top of the cache to synchronize with the primary system.
On the secondary system, stop Post.
sp_ctrl> stop post
On the secondary system, start Post.
sp_ctrl> start post
On the primary system, view the number of messages in the post queue, and keep checking until the number of messages is 0.
sp_ctrl> qstatus
On the secondary system, stop Export to prevent accidental changes made on that system from being replicated to the primary system.
sp_ctrl> stop export
Note: The secondary system is now in a failover-ready state again, with:
In an unplanned failure of the secondary (target) Oracle instance, the replication environment from that system to the primary system is corrupted. This procedure enables you to restore the replication configuration without affecting the database users on the primary system, and without the need to reactivate the configuration file on the primary system. Only the secondary configuration is affected.
This procedure cleans out the SharePlex queues and restores the target instance by means of a hot backup from the source system. You will use the reconcile command to synchronize the results of the backup with ongoing replicated user changes after the copied instance is recovered.
Oracle database on Unix or Linux
This procedure is divided into logical segments. Follow them in the order presented.
On the secondary system, stop Post.
sp_ctrl> stop post
On the secondary system, deactivate the configuration file.
sp_ctrl> deactivate config filename
Note: The deactivation causes the error “Error in sp_cnc.” in the Event Log. You may ignore it and continue with the procedure.
On the primary system, delete the post queue.
sp_ctrl> delete queue queuename:P for datasource-datadest
Example: sp_ctrl> delete queue sysB:P for o.oraA-o.oraB
Note: You are deleting the queues on the primary system because there could be messages remaining from uncommitted transactions sent from the secondary instance before it failed.
On the secondary system, delete the capture queue.
sp_ctrl> delete queue datasource:C
Example: sp_ctrl> delete queue o.oraB:C
On the secondary system, delete the export queue.
sp_ctrl> delete queue queuename:X
Example: sp_ctrl> delete queue sysB:X
Note: You are deleting the queues on the secondary system because the capture and export queues on that system still retain a record of the transactions that have already been processed.
On the primary system, switch log files.
On-premises database:
svrmgr1> alter system switch logfile;
Amazon RDS database:
Use Amazon RDS procedure rdsadmin.rdsadmin_util.switch_logfile.
In SQL*Plus, run the cleanup.sql script from the bin subdirectory of the SharePlex product directory.
On the secondary system, issue the reconcile command using the sequence number of the log that you noted previously. If you are using named post queues, issue the command for each one. If you do not know the queue names, issue the qstatus command first.
sp_ctrl> qstatus
sp_ctrl> reconcile queue queuename for datasource-datadest seq sequence_number
Example: reconcile queue SysA for o.oraA-o.oraA seq 1234
On the secondary system, after the sp_ctrl prompt returns, stop Export. This ensures that nothing accidentally gets replicated to the primary system when you activate the configuration on the secondary system.
sp_ctrl> stop export
On the secondary system, activate the configuration file.
sp_ctrl> activate config filename
On the secondary system, start Post.
sp_ctrl> start post
Use the status command to determine if any other SharePlex processes have a status of Stopped due to Error and start them.
sp_ctrl> status
sp_ctrl> start process
Note: The secondary system is now prepared for future failover.
In a planned failover of database activity to a secondary Oracle instance, you can quickly move SharePlex to the secondary system. While users continue their transactions on that system, SharePlex captures their changes and stores them until the primary system is back online and activity is moved back to that system.
Oracle database on Unix or Linux
This procedure is divided into logical segments. Follow them in the order presented. Do not shut down the primary instance until prompted in the procedure.
On the primary system, flush the data in the queues to the secondary system. This command stops Post on the secondary system and places a marker in the data stream to establish a synchronization point between the primary and secondary data.
sp_ctrl> flush datasource
where: datasource is the datasource specification of the primary Oracle instance, for example o.OraA.
On the secondary system, verify that Post stopped. (Continue to issue this command until it shows that Post stopped.)
sp_ctrl> status
On the primary system, verify that there are no messages in the capture and export queues. The Number of Messages and the Backlog (messages) fields must be 0.
sp_ctrl> qstatus
On the secondary system, verify that there are no messages in the post queue. The Number of Messages and the Backlog (messages) fields must be 0.
sp_ctrl> qstatus
On the primary system, shut down the Oracle instance with the abort option. Do not use the immediate option.
svrmgr1> shutdown abort
Note: This resets the sequence on the primary system to the top of the cache when the database starts.
On the secondary system, verify that Export is stopped. This prevents user changes from being replicated to the primary system until it is back online and SharePlex is ready to receive them. Stop Export if needed.
sp_ctrl> status
sp_ctrl> stop export
On the secondary system, start Export so that SharePlex sends the accumulated replicated data to the primary system.
sp_ctrl> start export
Note: SharePlex passes any sequence updates from the secondary system back to the primary system when Export starts.
On the primary system, stop Export.
sp_ctrl> stop export
On the primary system, allow Post to process the message backlog that was sent from the primary system.
On the secondary system, flush the data in the queues to the primary system.
sp_ctrl> flush datasource
where: datasource is the datasource specification of the secondary Oracle instance, for example o.OraB.
On the primary system, verify that Post stopped. (Continue to issue this command until it shows that Post stopped.)
sp_ctrl> status
On the secondary system, verify that there are no messages in the capture and export queues. The Number of Messages and the Backlog (messages) fields must be 0.
sp_ctrl> qstatus
On the primary system, verify that there are no messages in the post queue. The Number of Messages and the Backlog (messages) fields must be 0.
sp_ctrl> qstatus
On the secondary system, shut down SharePlex.
sp_ctrl> shutdown
On the secondary system, shut down the Oracle instance with the abort option. Do not use the immediate option.
svrmgr1> shutdown abort
Note: This resets the sequence on the secondary system to the top of the cache when the database starts.
On the secondary system, start the Oracle instance.
svrmgr1> startup
Note: The sequence on the secondary system is now at the top of the cache. When the next value is selected on the primary system, a new cache is acquired and is replicated to the secondary system. Now, the primary system is at the start of a cache, and the secondary system is at the top of a cache.
On the secondary system, stop Export. This prevents any accidental DML on that system from being replicated to the primary system.
sp_ctrl> stop export
On the primary system, start export.
sp_ctrl> start export
On the secondary system, start Post.
sp_ctrl> start post
Replication from the primary instance to the secondary instance is now active to keep the two databases synchronized and ready for future failover when needed.
© 2024 Quest Software Inc. ALL RIGHTS RESERVED. 使用条款 隐私 Cookie Preference Center