Chat now with support
Chat with Support

Welcome, ApexSQL customers to Quest Support Portal click here for for frequently asked questions regarding servicing your supported assets.

SharePlex 9.0 - Administration Guide

About this Guide Conventions used in this guide Overview of SharePlex Run SharePlex Run multiple instances of SharePlex Execute commands in sp_ctrl Set SharePlex parameters Configure SharePlex to replicate data Configure replication to and from a container database Configure named queues Configure partitioned replication Configure SharePlex to maintain a change history target Configure a replication strategy Configure SharePlex to replicate Oracle DDL Set up error handling Transform data Configure SharePlex security features Activate replication in your production environment Monitor SharePlex Prevent and solve replication problems Repair out-of-sync Data Procedures to maintain Oracle high availability Make changes to an active replication environment Apply an Oracle application patch or upgrade Back up Oracle data on the source or target Tune the Capture process Tune the Post process Appendix A: Peer-To-Peer Diagram Appendix B: SharePlex environment variables

Before you use Compare and Repair

Before running the compare or repair commands, review these guidelines.

  • All of the SharePlex processes (Capture, Read, Export, Import, Post) must be running when you run a comparison or repair command.
  • The tables that you want to compare or repair must be part of an active configuration file.
  • Uncommitted transactions on a source table prevent the compare and repair processes from obtaining the brief locks they need to obtain read consistency. Make certain that all transactions are committed before you run a comparison or repair.
  • If a table is large, it will probably need to be sorted in the TEMP tablespace. Before running the compare or repair commands, the TEMP tablespace may need to be made larger. The size depends on the setting of the SP_DEQ_THREADS parameter or the threads option within the command syntax, both of which controls the number of processing threads used by SharePlex on the target. Each thread processes a table. At the default of two threads, the size of the tablespace should be larger than the sum of the sizes of the two largest tables. If you set the number of threads higher, then increase the size of the tablespace to accommodate a proportionate number of the largest tables. However,
  • The UNDO tablespace may also need to be increased. Based on transaction volume and the length of time it takes to compare the largest table, increase the size of the UNDO tablespace and increase the undo_retention database parameter to avoid an ORA-1555 Snapshot too old error. Tables with LOBs take much longer to compare or repair than tables without them.

The following are commonly modified compare and repair parameters. Do not increase the values unless necessary. For details about these parameters, see their documentation in the SharePlex Reference Guide.

Parameter Description
SP_DEQ_MALLOC

This parameter controls the fetch batch size. The batch size controls the number of rows that SharePlex selects at once for comparison. Larger batch sizes increase processing speed but require more memory. The value is divided equally by the number of compare threads to be used, and then the batch size is recalculated based on all column sizes added together.

SP_DEQ_PARRALLISM

This parameter manages the select statement Degree of Parallelism hint. The parallelism option of the command overrides this setting.

SP_DEQ_PART_TABLE_UPDATE This parameter controls how the repair commands work on Oracle partitioned tables, depending on whether row movement is possible.
SP_DEQ_READ_BUFFER_SIZE This parameter controls the size of the buffer that holds fetched LONG and LOB data and can be adjusted based on available system memory.
SP_DEQ_ROW_LOCK_THRESHOLD This parameter sets a threshold that controls whether SharePlex uses row-level or table-level locking when a where option is used.
SP_DEQ_SKIP_LOB

This parameter determines whether or not LOBs are included in the compare/repair processing.

  • When the parameter is set to the default of 0, the compare processes include LOBs in their processing.
  • When the parameter is set to 1, only non-LOB columns are compared and repaired. If LOBs are not modified once inserted, you can speed up processing by setting this parameter to 1.

Set this parameter on the source system.

SP_DEQ_TIMEOUT This parameter sets a queue backlog threshold. High backlogs delay the establishment of a connection between the source and target compare/repair processes. If the backlog meets or exceeds this value, any compare or repair command that is issued on the source will exit and return an error. If this happens, consider running the compare or repair when the system is less busy.

How to use the repair and compare commands

The recommended procedure for maintaining synchronized data through the comparison and repair commands is to run the compare or compare using command first, then view the results with the repair status command. This command shows any rows that are out-of-sync and the possible cause. Unless the cause of the out-of-sync condition is corrected, replication will go out of synchronization again, even if you repair the rows this time. After the problem is fixed, issue the repair or repair using command.

You can run the repair or repair using command without doing a preliminary comparison. The command performs a comparison first, to identify the out-of-sync rows, and then it repairs those rows. However, the underlying cause of the out-of-sync condition must be corrected to prevent future out-of-sync conditions.

See for causes and solutions for out-of-sync conditions.

To view the status or results of a comparison, use the compare status command in sp_ctrl.

To view the status or results of a repair, use the repair status command in sp_ctrl.

When to run a repair

The best time to repair a target table depends on its size, the cause of the problem, the extent of out-of-sync rows, and how long you are willing to tolerate users being locked out. Before you initiate a repair, consider the following:

  • Although the users of the tables are not usually affected by the brief locks that are applied when tables are compared, they are locked out of the target table for the duration of the repair process. For a small table, this might not be disruptive, but for a large table needing extensive repairs, the wait can be significant.
  • Locks on a target table can reduce posting performance if Post must wait for the repair to finish before it can apply changes to that table and move on to other tables. This increases the latency of the target data and causes operations to accumulate in the post queue. If the objects that Post needs to change are different from those being repaired, the two processes run simultaneously.
  • If you must repair a table immediately, but cannot tolerate locks or replication latency, you can use the where option to limit the repair to certain rows. An alternative is to use the key option, but this option may cause the repair to miss some out-of-sync rows.

  • If the repair can wait, correct the cause of the problem immediately and then do the repair during non-peak hours.
  • Replication latency can slow down the compare and repair processing. The message sent from the source to spawn the command processes on the target is sent through the queues along with regular replicated data. Delays caused by a data backlog will delay the spawn message and cause the process on the source to lose its read consistency, which results in errors. If possible, perform comparisons and repairs during off-peak hours.

How to run the compare and repair commands

To get additional information and syntax for the compare and repair commands, see the command documentation in the SharePlex Reference Guide.

Procedures to maintain Oracle high availability

This chapter contains instructions for recovering replication along with the database and applications during a failover in a high-availability environment. To support these procedures, SharePlex must be properly configured to support high availability. See Configure replication to maintain high availability.

Contents

Recover replication if the primary system fails

In an unplanned failure of the primary (source) machine, replicated data remaining in the SharePlex queues on that system will be unrecoverable as a result of buffering and the likelihood that the queues are corrupted. In a high-availability environment, you can move replication to the secondary (target) machine along with the database users to maintain data availability. When the primary system is restored, you can move users and replication back to that system with minimal downtime using a hot backup of the secondary instance.

For this procedure, you will activate a configuration file and then use the reconcile command to synchronize the results of the backup with ongoing replicated user changes after the copied instance is recovered.

Supported databases

Oracle database on Unix or Linux

Requirements

Procedure 1: Move replication to the secondary system

This procedure moves replication to the secondary system after an unplanned failure of the primary system.

  1. Verify that Export is stopped on the secondary system.

    sp_ctrl> stop export

  2. Use the qstatus command to view the post queue, and keep issuing this command until the number of backlogged messages is 0. Note: Do not wait for the actual numberof messages to be 0. If the primary system failed before the commit for some transactions were received, messages for these partial transactions will remain in the queue until cleared later in this procedure.

    sp_ctrl> qstatus

  3. Run the script that grants INSERT, UPDATE and DELETE access to all users on the secondary system.
  4. Run the script that enables triggers and constraints on the secondary system when users begin using this system.
  5. Implement the failover procedure for relocating users to the secondary system, including starting the applications.
  6. Move users to the secondary system and let them resume working, but do not start Export. Their transactions will now be accumulating in the export queue awaiting restoration of the source database. Note: If started, Export will repeatedly try to connect to the primary system, wasting system resources.
  7. Go to Procedure 2: Move replication to the restored primary system.

Procedure 2: Move replication to the restored primary system

This procedure moves users back to the primary machine after it is recovered from an unplanned failure. Follow each segment in the order presented.

Restore the replication environment on the primary system

  1. On the primary system, recover the SharePlex directories, system files and Oracle files from the backups and archives.
  2. On the primary system, start sp_cop using the -s option to prevent the SharePlex processes (Capture, Read, Export, Import, Post) from starting.

    $ /productdir/bin/sp_cop -s &

  3. On the primary system, start sp_ctrl.
  4. On the primary system, deactivate the configuration file. When you copied the archived SharePlex variable-data directory to the primary system, you copied the configuration that was active before the system failed. This causes the Capture process to set the transaction number to “1” when replication from the primary system resumes.

    sp_ctrl> deactivate config filename

Purge the queues

  1. Run sp_ctrl on the primary and secondary systems.
  2. On the primary system, delete the capture queue.

    sp_ctrl> delete queue datasource:C

    Example: sp_ctrl> delete queue o.oraA:C

  3. On the primary system, delete the export queue.

    sp_ctrl> delete queue queuename:X

    Example: sp_ctrl> delete queue sysA:X

  4. On the secondary system, delete the post queue.

    sp_ctrl> delete queue queuename:P for datasource-datadest

    Example: sp_ctrl> delete queue sysA:P for o.oraA-o.oraB

    Note:

    • You are issuing the delete queue command on the primary system because you restored the old capture and export queues when you restored the archived SharePlex directories.
    • You are issuing the delete queue command on the secondary system because data remaining in the post queue cannot be posted. The primary system failed before Post received a COMMIT for remaining transactions. SharePlex will rebuild the queues when you reactivate the configuration and the two systems are reconciled.

Start replication from secondary to primary system

  1. On the primary system, verify that all SharePlex processes are stopped.

    sp_ctrl> status

  2. On the primary system, stop Post.

    sp_ctrl> stop post

  3. On the secondary system, start Export. This establishes communication between the primary and secondary systems.:

    sp_ctrl> start export

Synchronize the source and target data

  1. On the secondary system, run an Oracle hot backup.
  2. When the hot backup is finished, switch log files on the secondary system and make a note of the highest archive-log sequence number.

  3. On the primary system, recover the primary database from the backup by using the UNTIL CANCEL option in the RECOVER clause, and cancel the recovery after the log with the number you recorded has been fully applied.
  4. Open the database with the RESETLOGS option.

    Note: This resets the sequences on the primary system to the top of the cache upon startup.

  5. On the primary system, issue the reconcile command using the sequence number of the log that you noted previously. If you are using named post queues, issue the command for each one. If you do not know the queue names, issue the qstatus command first.

    sp_ctrl> qstatus

    sp_ctrl> reconcile queue queuename for datasource-datadest seq sequence_number

    Example: reconcile queue SysB for o.oraA-o.oraA seq 1234

  6. On the primary system, disable triggers on the tables, or run the sp_add_trigger.sql utility script so that the triggers ignore the SharePlex user.
  7. On the primary system, disable check constraints and scheduled jobs that perform DML.
  8. On the primary system, log onto SQL*Plus as the SharePlex Oracle user and run the cleanup.sql utility from the bin sub-directory of the SharePlex product directory. This truncates the SharePlex tables and updates the SHAREPLEX_ACTID table.
  9. On the secondary system, truncate the SHAREPLEX_TRANS table. This table contains transaction information that the Post process was using before the primary system failed, and therefore that information is obsolete. Truncating the table restores transaction consistency between the two systems.

Activate replication on the primary system

  1. On the primary system, activate the configuration file.

    sp_ctrl> activate config filename

  2. On the primary system, start Post.

    sp_ctrl> start post

Restore the object cache

  1. On the primary system, view the status of the SharePlex processes.

    sp_ctrl> status

    Note: This command shows that Post stopped due to an error, which occurred because the object cache is missing.

  2. On the primary system, issue the show log command with the filter option, and filter on the keyword "objcache."

    sp_ctrl> show log filter=objcache

  3. Look for a Post error message that refers to a file with a name similar to the following example:

    0x0a0100c5+PP+sys4+sp_opst_mt+o.quest-o.ov-objcache_sp_opst_mt.18

    You are looking for a string that contains the string “objcache_sp_opst_mt,” followed by a number. This is the object-cache file that the Post process needs. If you are using named post queues, there will be more than one error message, each referring to a different object-cache file but ending with the same number, such as the number .18 in the example.

  4. Make a note of the full pathname of the Post object-cache file(s) named in the error message. The path will be the state directory in the SharePlex variable-data directory, for example:

    splex_vardir/state/0x0a0100c5+PP+sys4+sp_opst_mt+o.quest-o.ov-objcache_sp_opst_mt.18

  5. On the secondary system, shut down SharePlex.

    sp_ctrl> shutdown

  6. On the secondary system, change directories to the state sub-directory of the SharePlex variable-data directory and locate the Capture object-cache file. This file will have a name similar to the one in the following example.

    o.quest-objcache_sp_ocap.18

    Important! If there are more than one of these files, use the one with the most recent number at the end of it — this number should match the number at the end of the Post object-cache file, such as .18 in the example.

  7. Copy the Capture object-cache file to the primary system and rename it to the full pathname of the Post object-cache file that you noted previously.
  8. On the primary system, start Post.

    sp_ctrl> start post

Switch users back to the primary system

  1. On the secondary system, stop user access to the database.
  2. On the primary system, view the post queue and continue to issue the command until the number of messages is 0.

    sp_ctrl> qstatus

  3. On the secondary system, shut down the Oracle instance.

    svrmgr1> shutdown

  4. On the secondary system, start the Oracle instance.

    svrmgr1> startup

    Note: This resets the sequence on the secondary system to the top of the cache to synchronize with the primary system.

  5. On the primary system, enable database objects that were disabled.
  6. On the secondary system, start SharePlex.
  7. On the secondary system, stop Post.

    sp_ctrl> stop post

  8. On the secondary system, disable triggers on the tables, or run the sp_add_trigger.sql utility script so that the triggers ignore the SharePlex user.
  9. On the secondary system, disable check constraints and scheduled jobs that perform DML.
  10. On the secondary system, start Post.

    sp_ctrl> start post

  11. On the primary system, view the number of messages in the post queue, and keep checking until the number of messages is 0.

    sp_ctrl> qstatus

  12. When the number of messages is 0, switch users back to the primary system.
  13. On the secondary system, stop Export to prevent accidental changes made on that system from being replicated to the primary system.

    sp_ctrl> stop export

Note: The secondary system is now in a failover-ready state again, with:

  • no users
  • an active configuration
  • disabled or modified triggers, constraints, and scheduled jobs
  • a stopped Export process
Related Documents