Before running the compare or repair commands, review these guidelines.
The following are commonly modified compare and repair parameters. Do not increase the values unless necessary. For details about these parameters, see their documentation in the SharePlex Reference Guide.
Parameter | Description |
---|---|
SP_DEQ_MALLOC |
This parameter controls the fetch batch size. The batch size controls the number of rows that SharePlex selects at once for comparison. Larger batch sizes increase processing speed but require more memory. The value is divided equally by the number of compare threads to be used, and then the batch size is recalculated based on all column sizes added together. |
SP_DEQ_PARRALLISM |
This parameter manages the select statement Degree of Parallelism hint. The parallelism option of the command overrides this setting. |
SP_DEQ_PART_TABLE_UPDATE | This parameter controls how the repair commands work on Oracle partitioned tables, depending on whether row movement is possible. |
SP_DEQ_READ_BUFFER_SIZE | This parameter controls the size of the buffer that holds fetched LONG and LOB data and can be adjusted based on available system memory. |
SP_DEQ_ROW_LOCK_THRESHOLD | This parameter sets a threshold that controls whether SharePlex uses row-level or table-level locking when a where option is used. |
SP_DEQ_SKIP_LOB |
This parameter determines whether or not LOBs are included in the compare/repair processing.
Set this parameter on the source system. |
SP_DEQ_TIMEOUT | This parameter sets a queue backlog threshold. High backlogs delay the establishment of a connection between the source and target compare/repair processes. If the backlog meets or exceeds this value, any compare or repair command that is issued on the source will exit and return an error. If this happens, consider running the compare or repair when the system is less busy. |
The recommended procedure for maintaining synchronized data through the comparison and repair commands is to run the compare or compare using command first, then view the results with the repair status command. This command shows any rows that are out-of-sync and the possible cause. Unless the cause of the out-of-sync condition is corrected, replication will go out of synchronization again, even if you repair the rows this time. After the problem is fixed, issue the repair or repair using command.
You can run the repair or repair using command without doing a preliminary comparison. The command performs a comparison first, to identify the out-of-sync rows, and then it repairs those rows. However, the underlying cause of the out-of-sync condition must be corrected to prevent future out-of-sync conditions.
See for causes and solutions for out-of-sync conditions.
To view the status or results of a comparison, use the compare status command in sp_ctrl.
To view the status or results of a repair, use the repair status command in sp_ctrl.
The best time to repair a target table depends on its size, the cause of the problem, the extent of out-of-sync rows, and how long you are willing to tolerate users being locked out. Before you initiate a repair, consider the following:
If you must repair a table immediately, but cannot tolerate locks or replication latency, you can use the where option to limit the repair to certain rows. An alternative is to use the key option, but this option may cause the repair to miss some out-of-sync rows.
To get additional information and syntax for the compare and repair commands, see the command documentation in the SharePlex Reference Guide.
This chapter contains instructions for recovering replication along with the database and applications during a failover in a high-availability environment. To support these procedures, SharePlex must be properly configured to support high availability. See Configure replication to maintain high availability.
Recover replication if the primary system fails
Recover replication if the secondary Oracle instance fails
In an unplanned failure of the primary (source) machine, replicated data remaining in the SharePlex queues on that system will be unrecoverable as a result of buffering and the likelihood that the queues are corrupted. In a high-availability environment, you can move replication to the secondary (target) machine along with the database users to maintain data availability. When the primary system is restored, you can move users and replication back to that system with minimal downtime using a hot backup of the secondary instance.
For this procedure, you will activate a configuration file and then use the reconcile command to synchronize the results of the backup with ongoing replicated user changes after the copied instance is recovered.
Oracle database on Unix or Linux
This procedure moves replication to the secondary system after an unplanned failure of the primary system.
Verify that Export is stopped on the secondary system.
sp_ctrl> stop export
Use the qstatus command to view the post queue, and keep issuing this command until the number of backlogged messages is 0. Note: Do not wait for the actual numberof messages to be 0. If the primary system failed before the commit for some transactions were received, messages for these partial transactions will remain in the queue until cleared later in this procedure.
sp_ctrl> qstatus
This procedure moves users back to the primary machine after it is recovered from an unplanned failure. Follow each segment in the order presented.
On the primary system, start sp_cop using the -s option to prevent the SharePlex processes (Capture, Read, Export, Import, Post) from starting.
$ /productdir/bin/sp_cop -s &
On the primary system, deactivate the configuration file. When you copied the archived SharePlex variable-data directory to the primary system, you copied the configuration that was active before the system failed. This causes the Capture process to set the transaction number to “1” when replication from the primary system resumes.
sp_ctrl> deactivate config filename
On the primary system, delete the capture queue.
sp_ctrl> delete queue datasource:C
Example: sp_ctrl> delete queue o.oraA:C
On the primary system, delete the export queue.
sp_ctrl> delete queue queuename:X
Example: sp_ctrl> delete queue sysA:X
On the secondary system, delete the post queue.
sp_ctrl> delete queue queuename:P for datasource-datadest
Example: sp_ctrl> delete queue sysA:P for o.oraA-o.oraB
Note:
On the primary system, verify that all SharePlex processes are stopped.
sp_ctrl> status
On the primary system, stop Post.
sp_ctrl> stop post
On the secondary system, start Export. This establishes communication between the primary and secondary systems.:
sp_ctrl> start export
When the hot backup is finished, switch log files on the secondary system and make a note of the highest archive-log sequence number.
Open the database with the RESETLOGS option.
Note: This resets the sequences on the primary system to the top of the cache upon startup.
On the primary system, issue the reconcile command using the sequence number of the log that you noted previously. If you are using named post queues, issue the command for each one. If you do not know the queue names, issue the qstatus command first.
sp_ctrl> qstatus
sp_ctrl> reconcile queue queuename for datasource-datadest seq sequence_number
Example: reconcile queue SysB for o.oraA-o.oraA seq 1234
On the primary system, activate the configuration file.
sp_ctrl> activate config filename
On the primary system, start Post.
sp_ctrl> start post
On the primary system, view the status of the SharePlex processes.
sp_ctrl> status
Note: This command shows that Post stopped due to an error, which occurred because the object cache is missing.
On the primary system, issue the show log command with the filter option, and filter on the keyword "objcache."
sp_ctrl> show log filter=objcache
Look for a Post error message that refers to a file with a name similar to the following example:
0x0a0100c5+PP+sys4+sp_opst_mt+o.quest-o.ov-objcache_sp_opst_mt.18
You are looking for a string that contains the string “objcache_sp_opst_mt,” followed by a number. This is the object-cache file that the Post process needs. If you are using named post queues, there will be more than one error message, each referring to a different object-cache file but ending with the same number, such as the number .18 in the example.
Make a note of the full pathname of the Post object-cache file(s) named in the error message. The path will be the state directory in the SharePlex variable-data directory, for example:
splex_vardir/state/0x0a0100c5+PP+sys4+sp_opst_mt+o.quest-o.ov-objcache_sp_opst_mt.18
On the secondary system, shut down SharePlex.
sp_ctrl> shutdown
On the secondary system, change directories to the state sub-directory of the SharePlex variable-data directory and locate the Capture object-cache file. This file will have a name similar to the one in the following example.
o.quest-objcache_sp_ocap.18
Important! If there are more than one of these files, use the one with the most recent number at the end of it — this number should match the number at the end of the Post object-cache file, such as .18 in the example.
On the primary system, start Post.
sp_ctrl> start post
On the primary system, view the post queue and continue to issue the command until the number of messages is 0.
sp_ctrl> qstatus
On the secondary system, shut down the Oracle instance.
svrmgr1> shutdown
On the secondary system, start the Oracle instance.
svrmgr1> startup
Note: This resets the sequence on the secondary system to the top of the cache to synchronize with the primary system.
On the secondary system, stop Post.
sp_ctrl> stop post
On the secondary system, start Post.
sp_ctrl> start post
On the primary system, view the number of messages in the post queue, and keep checking until the number of messages is 0.
sp_ctrl> qstatus
On the secondary system, stop Export to prevent accidental changes made on that system from being replicated to the primary system.
sp_ctrl> stop export
Note: The secondary system is now in a failover-ready state again, with:
© ALL RIGHTS RESERVED. Feedback Terms of Use Privacy