Chatee ahora con Soporte
Chat con el soporte

SharePlex 10.2 - Administration Guide

About this Guide Conventions used in this guide Revision History Overview of SharePlex Run SharePlex Run multiple instances of SharePlex Execute commands in sp_ctrl Set SharePlex parameters Configure data replication Configure replication to and from a container database Configure named queues Configure partitioned replication Configure replication to a change history target Configure a replication strategy Configure DDL replication Configure error handling Configure data transformation Configure security features Start replication on your production systems Monitor SharePlex Prevent and solve replication problems Repair out-of-sync data Tune the Capture process Tune the Post process Recover replication after Oracle failover Make changes to an active replication environment Apply an Oracle application patch or upgrade Back up Oracle data on the source or target Troubleshooting Tips Appendix A: Peer-To-Peer Diagram Appendix B: SharePlex environment variables

Solve replication problems

This section reviews common problems that you could encounter while data is being replicated.

If the issue you are experiencing is not listed in this documentation, search the SharePlex Knowledge Base at:

https://support.quest.com.

The Knowledge Base provides filtering options and links to other resources that can help you use and troubleshoot SharePlex.

General issues

Problem Description Solution
Network problems

Network factors can cause data to transfer to a target system too slowly, or not at all.

Make certain that all Export and Import processes are running, and ask the network administrator to check the network for causes of slowdowns.
Parameter settings

Some SharePlex parameters, when changed from their default settings, can inhibit replication performance.

To determine if any parameter settings are responsible for reduced performance, issue the list param modified command to view parameters that have non-default settings. Review the documentation for those parameters in the SharePlex Reference Guide to determine whether a particular setting is affecting replication speed.

Oracle Capture-related issues

If Capture is stopped, issue the status command in sp_ctrl to verify whether or not it was stopped intentionally by an authorized SharePlex user. If it was, find out why and make certain that it is not stopped for too long.

The following are other reasons why Capture could slow down or stop unexpectedly.

Problem Description Solution
Excessive chaining

Excessive chaining reduces the performance of the source database because Oracle must read multiple blocks for a single row. Excessive chaining reduces Post performance on the target system because block fragmentation in the database decreases the speed at which Post can apply SQL statements.

Reorganize chained tables.
Ineffective redo log configuration

Capture reads archive logs if the redo logs wrap before Capture is finished with them, but this can slow down replication and consume disk space. In most cases, the redo logs should be configured so that SharePlex can avoid using the archive logs.

For more information about how to set up the logs, see the SharePlex Installation and Setup Guide.

SharePlex has low process priority

If Capture, Read, or both slow down during hot backups or other Oracle-intensive operations, view the process-priority settings on the system to determine if SharePlex can be assigned more resources.

Ideally, the SharePlex priority should match Oracle's priority.

Keys are not being logged

If PK/UK logging is not enabled for the tables in replication, SharePlex must query the database to get the key values for update and delete operations.

Enable PK/UK logging.

Temporary tables and FND tables Temp tables receive numerous changes that cause replication overhead and performance degradation. Remove temporary tables from replication.
archive logs not available

When Capture detects a log wrap and the archive logs are not available, Capture stops and returns a “Log wrap detected” error. It waits a certain amount of time and then starts again, continuing this process until the logs are restored.

Restore the archive logs from the one that Capture last processed or point SharePlex to their location with the SP_OCT_ARCH_LOC parameter. To determine the log that Capture needs, use the show capture command with the detail option. For more information, see How to restore Oracle archive logs .

Compressed archive logs

Capture cannot read compressed archive logs.

Uncompress the current log that Capture needs and all those after it so that Capture can process them. To determine the log that Capture needs, use the show capture command with the detail option.

archive logs in unexpected location

Capture stops if it cannot locate the archive logs it needs. When the redo logs wrap, Capture looks for the archives in Oracle’s archive-log list, and if the logs are not there, Capture looks in the location specified by the SP_OCT_ARCH_LOC parameter.

If you are storing the archive logs in a location other than the normal Oracle location, make sure this parameter is set to the full path name of the directory containing the archive logs.

No access to the redo logs

Capture stops when it cannot read or find the logs, then it tries to read the log again.

Find out if someone changed permissions such that SharePlex does not have permission to read the redo logs.

No access to the variable-data directory

Capture stops if it is unable to write to the SharePlex logs because of restrictive permissions on the state sub-directory in the variable-data directory, or if there are space restrictions in that directory.

Verify the permissions and space for the variable-data directory.

Oracle Post-related issues

There are a number of things that cause the Post process to slow down. SharePlex generates the operation taking too long message when Post exceeds the internally controlled allowable wait time to apply a SQL statement to the target instance. Often it only takes one table to cause a bottleneck. Use the show sql or show post command to find out which table Post is processing, and then check for the following.

Problem Description Solution
Full table scans

If a target table does not have a key, Post must perform a full table scan to find the correct row. This slows the Post process.

Do one of the following:
Bitmap indexes on target tables

Bitmap indexes are beneficial for queries, but they slow down the DML operations applied by Post.

Avoid using bitmap indexes on target tables in replication, or only use them on tables that do not have frequent DML activity.

Disk I/O bottleneck

Disk I/O bottlenecks on the target system are a common cause of slow Post performance. Post can spend a high percentage of its time waiting for Oracle to commit data. The effect is worse when there is a Capture process reading from the same log device in peer-to-peer replication.

Disk I/O is the nature of the database environment, but you can reduce the bottleneck by placing the redo logs on faster hard drives or on a solid-state drive.

High number of buffer gets

Review any table with SQL statements that cause a high number of buffer gets. There should only be two to four buffer gets on an index, depending on its size and whether or not it is a unique index.

If there are more buffer gets than four, the index probably needs to be rebuilt. You can rebuild an index without the need to reactivate the configuration file.
Oracle write-rate bottlenecks

By default, one buffer writer writes all dirty Oracle blocks to disk. Whenever that buffer writer process wakes up to write, it locks portions of shared memory and, in effect, blocks the processes that are either modifying or reading data blocks — including the Post process.

If this is a constant problem, you can consider increasing the number of writers. See the database documentation.

Small transaction size

Normally, Post performs an internal read/release after it receives each COMMIT, which means it purges that data from the queue as part of the checkpoint recovery process. For smaller transactions, this can cause excessive I/O on the target system and hinder the Post process.

If most of your transactions are small, try changing the value of the Post SP_OPO_READRELEASE_INTERVAL parameter. See the SharePlex Reference Guide for more information about this parameter.

Sequences not cached

If the sequences in replication are not cached, they add unnecessarily to the replication volume.

Cache sequences. If replicated sequences are part of a key, replicate the tables that contain those keys, and remove the sequences from the replication configuration. You should see significant performance improvement.

Low ulimit

An error similar to the following means that the system file descriptors setting on the target system needs to be increased:

Error 07-24-08 12:11:40.360226 8693 12345 Poster error: /var/quest/ vardir/log/event_log: Too many open files (posting from ora102, queue prodsys, to ora10b)

Solution: Set the ulimit as close to the optimal value of 1024 as possible.

The ulimit can be set either as a system hard limit or a session-based soft limit, as follows:

  • Set a hard limit: (Recommended) A root user and system restart are required to change the hard limit, but the value remains fixed at the correct level to support SharePlex. Consult your System Administrator for assistance.
  • Set a soft limit: A soft limit setting stays in effect only for the duration of the sp_cop session for which it was set, and then it reverts back to a default value that may be lower than the hard limit and too low for SharePlex.
Compare process locks

The repair command locks all rows that need repair, or the entire table if there are more than 1000 out-of-sync rows, throughout the repair process. This could block Post if Post tries to apply data to the table being repaired.

If you do not want Post to wait for a repair process to finish, you can kill the compare process. To avoid issues with locks caused by repair processes, consider running the repairs during non-peak hours.

Difference in Capture and Post speed

SharePlex reads and processes records from the redo logs faster than it can post those operations to the target database with standard SQL statements.

For more information, see Tune the Post process.

Full archive log directory

If Post appears stalled and will not shut down normally, but the only error in the Event Log is sp_opst_mt - operation taking too long, it could be that the archived log directory on the target system is full. If so, then Oracle cannot create new logs, and it suspends processing. Post stalls because it is waiting for Oracle.

Move some of the old archive logs to another device, or delete them to make room for new ones.

Commit reduction issues

If commit reduction is not working, it may be because records are not being dispatched quickly enough for there to be a valid message available after the commit. If a valid message is found after the commit message, then Post can skip the commit. Otherwise it issues the commit.

One way to find out if this is the issue is to stop the Post process completely, run a large amount of small transactions through replication, then start the Post process. If you do not see any commit reduction, then this would rule out this issue as the cause.

Post stopped

If Post stops, issue the status command in sp_ctrl to find out why.

• An idle status means there is no data in the post queue to post.

• A stopped by user status means that a SharePlex user stopped the Post process. To find out which user is responsible, view the user issued commands in the Event Log.

• A stopped due to error status means a replication or database error caused Post to stop.

The following are some potential causes for Post to stop unexpectedly.

Problem Description Solution
Correctable database errors

Post stops for database errors that can be corrected, so that you can fix the problem without risk of the data going out of synchronization. The errors and the faulty SQL statement are logged to the databaseID_errlog.sql file in the log sub-directory of the variable-data directory on the target system.

Use the information in the log file to correct the problem, then start Post again. Posting resumes from the point where it stopped.

Non-correctable database errors

Some database errors, such as out-of-sync conditions, cannot be corrected. In that case, SharePlex reports the error to the Event Log, writes the error and SQL statement to the databaseID_errlog.sql file, and continues processing. Sometimes, errors that cannot be corrected cause Post to stop.

If the errors cannot be corrected but you want Post to continue processing, list the errors in the in the databasemsglist file and then set the SP_OPO_CONT_ON_ERR or SP_OPX_CONT_ON_ERR parameter to 1, which directs SharePlex to ignore those errors and continue posting. For more information, see Continue to post when there is a DML error.

Locks on target tables

If a target table is locked, the Post process cannot apply a SQL statement and generates an error message. This message could mean that a user, application, or job is accessing the table and might have caused an out-of-sync condition. Or, for an Oracle target, it could mean that a repair command has locked the table.

Find out why the table was locked and resolve the access issue (unless due to the repair command). You might have to resynchronize the data if DML was performed on the table. For more information, see How to resynchronize source and target tables.

Configuration deactivation when there are named export queues

When you are using named export queues, and you deactivate the configuration, Post could stop with the following error instead of posting the remaining data from the queues:

sp_opst_mt (for o.qa920-o.qa920 queue q5) 15007 - Can't open poster queue que_NOEXIST: Queue does not exist.

To start Post and finish replication, shut down SharePlex, then start it again.

Queue name is too long

Post stops if a queue name is too long.

Make certain the name assigned to named queues in the configuration file are no longer than 15 characters.

Post generates a semaphore error If Post returns an error message of "shs_SEMERR: an error occurred with the semaphore" on a Windows system, SharePlex requires more semaphores than are available. This usually occurs when named queues or horizontally partitioned replication are being used. You can increase the number of semaphores through the MKS Toolkit.
  1. In the Windows Control Panel, open Configure PTC MKS Toolkit.
  2. Select Runtime Settings.
  3. From Category, select Semaphores.
  4. Set the following semaphores:

    Max Number of Semaphore IDs: 10

    Max Number of Semaphores: 512

    Max Number of Semaphore Undo entries: 90

(Oracle) New tables are not added

For an Oracle source, tables created after activation are automatically added to replication.

The auto-add feature is controlled by parameters. Make certain the correct ones are set to achieve your goals. For more information, see Control Oracle DDL replication.

(Oracle) No more open cursors

If you see the following error, Post has exceeded the available open cursors.

Warning: sp_opst_mt (for o.oracle-o.oracle queue oracle) Post has opened number cursors. No more available cursors! Exiting

Post requires a certain number of open cursors to the target database.

View the current database OPEN_CURSORS value using the following SQL statement:

select value from V$PARAMETER where name = 'open_cursors' ;

To determine an adequate OPEN_CURSORS value for SharePlex, see Adjust open cursors.

Other problems and solutions

Problem Description Solution
SharePlex cannot resolve a machine name Sometimes machine names cannot be resolved between Unix or Linux and Windows systems. Add the IP addresses and names of all servers (Unix, Linux, and Windows) in the replication network to the /etc/hosts file on all Unix and Linux machines, and add the same information to the hosts file on all Windows machines.
sp_cop is using too much CPU time SharePlex may be performing its overhead activities too frequently.

Increase the idle time of sp_cop with the SP_COP_IDLETIME parameter. See the parameter documentation in the SharePlex Reference Guide.

Corrupted source table

If a source table is corrupted or there is another reason that you do not want the replicated data to be posted to the target database, you can prevent posting for that table without removing it from the active configuration or affecting posting for other objects.

To disable posting for one or more tables, use the SP_OPO_DISABLE_OBJECT_NUM or SP_OPX_DISABLE_OBJECT_NUM parameter, which can be set to disable posting of both DML and DDL operations for specified object IDs. This parameter is disabled by default. When you are ready to begin posting to the target table again, disable the parameter again by resetting it to 0. For more information, see the parameter documentation in the SharePlex Reference Guide.

Only some columns were replicated If you have a table that is configured for vertically partitioned replication but that table name also satisfies a wildcard, the specific listing (with the vertical partitioning) takes precedence over the wildcarded listing. Vertical partitioning cannot be used with full-table replication for the same table. No action is needed if replication of the specified columns is desired. For more information, see Configure vertically partitioned replication.
Conflict resolution generates compile errors If you encounter compile problems with your conflict resolution routines, check whether any tables have the same names as their owners. A known issue in PL/SQL prevents the SharePlex conflict resolution logic from compiling the PL/SQL for tables whose names are the same as their owners. Oracle has stated that the issue will not be fixed. See Oracle TAR 2577886.996 for more information. This issue does not affect replication; SharePlex replicates data for tables with identical owner and table names.

Common replication errors

The following table explains many of the common error messages that you might experience during replication.

sp_cop error messages Cause Solution
sp_cop cannot setup; memory segment n in use. Processes that access the queues were still running when you last shut down sp_cop. Kill those processes. SharePlex processes start with sp_. When all have been killed, sp_cop should start.
Error: sp_cop can't setup shared memory statistics capability - exiting Error: sp_cop(shs) Cannot delete previous memory segment 303. Please check to see if any SharePlex processes are running (ps -ef | grep sp_). If there are some processes running then kill them and restart sp_cop. There are already one or more SharePlexsp_cop processes pointing to the same variable-data directory. To run multiple sessions of SharePlex, you need to use separate variable-data directories for each one. For more information, see Run multiple instances of SharePlex..
Capture error messages    
Capture time limit (300 sec) exceeded. Capture is not processing records, which could indicate a problem with the redo log. If, after a specific number of seconds, Capture cannot process a record, it stops, logs the record, and returns this message. See Oracle Capture-related issues for possible causes. If you cannot determine the cause of the problem, call Quest Support before a log wrap occurs.
(Oracle) Log wrap detected The redo logs wrapped and Capture cannot locate the archive logs. If archive logs are available, uncompress and restore them to the archive log directory. SharePlex looks for the archive logs first in the Oracle archive log list, then in the location specified by the SP_OCT_ARCH_LOC parameter. This parameter should always be set to the correct archived log directory. If you use compression for the archive logs, do not compress them until SharePlex is finished processing them. To determine the current log for SharePlex, issue the show capture command with the [detail] option in sp_ctrl on the source system. You can compress any logs that were generated before the current one. Note that this error also occurs when the archived log is corrupted.
Post error messages    
operation taking too long. It is taking more than the internally allowed time to apply a SQL statement to the target instance. For more information, see Oracle Post-related issues.
Rowid not found SharePlex cannot locate the correct row to update in the target database. Check for triggers, processes, or users that may have deleted the row on the target. For more information, see Solve synchronization problems.
Database not available. Post cannot log into the target database. Verify that the database is running, and determine whether someone changed the password of the SharePlex database account.
Oracle-related error messages    
Can’t access OBJ$Table SharePlex cannot access the Data Dictionary, which it must be able to do in order to replicate. Check the O7_DICTIONARY_ACCESSIBILITY Oracle tuning parameter and make sure it is set to TRUE (the default).
Forward and backward operation counts do not match... Messages may be out of sequence. Find out if Oracle was shut down before SharePlex. This can cause SharePlex to return errors, and in rare cases, corrupt the queues. The proper procedure is to shut down SharePlex and then shut down Oracle. For help resolving this problem, contact Quest Support.
Error: sp_opst_mt (for o.blues920-o.ora9 queue bluesky) 15033 - Failed to execute SQL on table: QA.T_DEST_1: ORA-00001: unique constraint (.) violated. A unique constraint was violated on the source system. The change entered the redo log, but Oracle rolled it back. The rollback also entered the redo log. SharePlex detects the constraint violation on the target. Ignore this message. The tables are still in synchronization because Oracle rolled back the offending operation. This is an unavoidable error because of the way Oracle handles the violation.
SQL Cache error messages    
Warning: Too many concurrent transactions. Will disable the SQL Cache capability. The SQL Cache size is set to 1, and more cursors are still needed. SharePlex disables the SQL Cache feature in this case. No action is required if this is your intended configuration. For more information, see Tune SQL Caching.

Warning: Running out of cursors. Number of cursors opened so far is number. Will attempt to decrease SQL Cache size.

or...

Notice: Shrinking SQL Cache size to number per session.

Post detected that it will exceed its maximum number of cursors and is trying to decrease the SQL Cache size. No action is required unless the value gets to 1 and there are still not enough cursors. For more information, see Tune SQL Caching.
SQL Cache disabled. The SQL Cache feature is disabled. No action is required if this is your intended configuration. For more information, see Tune SQL Caching.
Hints file error messages (Oracle only)    
15050 – hint file not found

SharePlex looks for the hints.SID file whether you use it or not. The location of this file is the data sub-directory of the SharePlex variable-data directory. If this file gets moved or deleted, SharePlex returns this error message.

To prevent this message, create a blank hints file in the data sub-directory of the variable-data directory. Name it hints.SID.

15051 – missing column in the hint file (either table of index name)

15052 – syntax error for tablename

15053 – syntax error for indexname

The hints file is not configured correctly. For more information, see Use Oracle INDEX hints.
15054 – source table’s object_id not found in object cache The hints file includes a source table that is not in the active configuration. All tables in the hints file must be listed in the active configuration. If this table is in the configuration, make certain that the owner name and table name are spelled in the hints file the same way as they are in the configuration.
15055 – more than n valid entries were entered into the hints file The hints file permits only as many table-index combinations as the value set by the SP_OPO_HINTS_LIMIT parameter. Either remove some of the table-index combinations or increase the value of SP_OPO_HINTS_LIMIT. For more information about this parameter, see the Post parameters documentation in the SharePlex Reference Guide.
15056 – error allocation memory for hints This indicates a system-level memory problem; the hints file itself does not demand a significant amount of memory. If you believe the system memory is sufficient, stop the Post process and start it again. If you are not using the hints file, you can ignore this error.
17000 – error opening hint file SharePlex cannot open the hints file. Check the file for corruption. If the file is valid, make certain there is sufficient read permission for the Post process. For more information, see Use Oracle INDEX hints.
Environment-related error messages    
sp_opst_mt: pid=num date/time src host/ sid=db01:N2PB /var/quest/vardir/log/ event_log: Too many open files The system file descriptors setting is not 1024.

Set the ulimit to 1024.

The ulimit can be set either as a system hard limit or a session-based soft limit, as follows:

  • Set a hard limit: (Recommended) A root user and system restart are required to change the hard limit, but the value remains fixed at the correct level to support SharePlex. Consult your System Administrator for assistance.
  • Set a soft limit: A soft limit setting stays in effect only for the duration of the sp_cop session for which it was set, and then it reverts back to a default value that may be lower than the hard limit and too low for SharePlex.
06/29/00 08:05 System call error: sp_ocap(que) (for o.QA11 queue o.QA11) No space left on device devname 06/29/00 08:05 Internal error: sp_ocap (for o.QA11 queue o.QA11) 10705 - writecommit failed que_BUFWRTERR: Error writing buffer to file 06/29/00 08:05 Process exited sp_ocap (for o.QA11 queue o.QA11) [pid = 8692] -exit(1) SharePlex ran out of space for the queues on the disk. For more information, see How to resolve disk space shortage.
gethostbyname name failed - exiting

The local hosts file is not configured properly.

The host name is not specified correctly in the hosts file (/etc/hosts on Unix and Linux or hosts on Windows). If this system is not part of a cluster, make corrections to the name in the file. If this system is part of a cluster, the virtual IP address must be mapped to a host alias in the hosts file. For instructions on configuring SharePlex in a cluster, see the SharePlex Installation and Setup Guide.

Other error messages    
Snapshot too old The read-consistent view required by SharePlex is no longer available. Increase the size of the rollback segment.
Parameter paramname does not exist in the paramdefaults file. Using hard coded default value. Please make sure that your param-defaults file is the correct version.

SharePlex cannot find a parameter that it needs to reference. You have an older version of the param-defaults file than the version of SharePlex that you are running. Someone might have updated the SharePlex binaries using a downloaded patch, and may not have installed the latest param-default file.

Always check for an updated param-defaults file when you manually update SharePlex.

Invalid DATE format detected in record with rowid=rowid, on obj object_id. See capture log for detail. A user or application entered an invalid date value into the Oracle database that bypassed the database’s validity check. Set the SP_OCT_DEF_ parameters to enable SharePlex to fix the format of dates and times if they are not caught by the database checks. See the SP_OCT_DEF parameters documentation in the SharePlex Reference Guide.
shs_SHMERR: an error occurred with shared memory. You ran the qview utility without shutting down SharePlex (sp_cop). Shut down SharePlex and re-run qview.

Solve Oracle DDL replication problems

This section reviews many common problems and solutions encountered when replication of Oracle DDL is active. For more information about SharePlex DDL support, see Configure DDL replication.

If the issue you are experiencing is not listed in this documentation, search the SharePlex Knowledge Base at:

https://support.quest.com.

The Knowledge Base provides filtering options and links to other resources that can help you use and troubleshoot SharePlex.

DDL is not replicating

Only some Oracle DDL is enabled by default. Other DDL support must be explicitly enabled with parameter settings.

The SP_OPO_STOP_ON_DDL_ERR parameter is defaulted to direct the Post process to stop if there is an error applying the DDL, to enable you to correct the problem to keep the databases synchronized. This parameter should be set to 1 (on). When enabled, messages such as the following notify you that DDL was skipped.

Skipping generic 9i DDL operation, schema (bob) could not be set.

FAILED DDL Replication for "create user bob."

Solution: Make certain the following parameters are set to appropriately. See Control Oracle DDL replication for more information about how to use these parameters to configure DDL replication.

  • SP_OCT_REPLICATE_DDL
  • SP_OCT_AUTOADD_ENABLE
  • SP_OCT_AUTOADD_MVIEW
  • SP_OCT_AUTOADD_SEQ
  • SP_OCT_REPLICATE_ALL_DDL

Replicated DDL is not completely displayed in the Event Log

SharePlex prints replicated DDL to the Event Log, but it truncates statements longer than 2,000 characters. Only the first 2,000 characters are recorded in the log.

Solution: None required.

Solve queue problems

This topic helps you solve replication problems that may be related to the SharePlex queues.

If the issue you are experiencing is not listed in this documentation, search the SharePlex Knowledge Base at:

https://support.quest.com.

The Knowledge Base provides filtering options and links to other resources that can help you use and troubleshoot SharePlex.

SharePlex is running out of disk space

Data accumulating in the queues can cause SharePlex to run out of disk space. The following are some possible causes and solutions.

Problem Description Solution
Stopped replication processes

When replication processes stop, data accumulates in the queues.

If a process was stopped by a user, find out why and then start it as soon as possible so that SharePlex can process the accumulated data. If a process stopped because of an error, view the Event log with the show log command to find out what happened, then resolve the problem so that processing can continue and the queue backlog can be reduced.

Large operations

The Post queue may become large when storing large transactions for which there has not yet been a COMMIT. To allow for rollbacks and data recovery, SharePlex retains data in the post queue until it receives the COMMIT.

View the post queue with the qstatus command. If the value in the Backlog (messages) field remains constant or is shrinking, while the value in the Number of messages field is increasing, Post is waiting for a COMMIT before releasing the data.

Use the show post detail command to verify that Post is processing transactions normally. If possible, set the COMMIT point for your application to 500 to generate smaller SQL statements for SharePlex to process. Also, consider creating a configuration that uses named post queues which isolate tables that are known for long transactions. For more information, see Configure named post queues.

If the accumulation of messages in the Post queue is threatening to cause disk space issues, you can stop Import until Post can clear out some of the operations from other transactions. For more information, see How to resolve disk space shortage.

(Oracle) Capture is processing archive logs

If Capture is processing archive logs, the capture queue consumes disk space while the archived records are being processed.

If the location of Capture in the logs is far behind that of the database, the latency between the source and target data may be too large to be acceptable to the target users. Rather than add more disk space to the variable-data directory, it might be more practical to run ora_cleansp to clean up the replication environment and queues, resynchronize the data, and reactivate the configuration.

If SharePlex continues to run out of disk space because Capture is processing the archive logs, the SharePlex Support team can help you tune the performance of Capture and adjust the redo configuration.

Unplanned increase in source transaction activity

An unplanned increase in activity on the source system can cause data to accumulate in the Post queue and approach the maximum amount of allocated disk space.

For more information, see How to resolve disk space shortage.

There is a “failure to write and open queue” error

If there is a sequence of messages in the Event Log similar to the following, stop and start sp_cop.

1 08-06-12 13:20:17.089485 2384 1 sp_ordr(rim) (for o.user queue o.user) Error opening output queue rv=9 que_open(-,writeuser+ X,0x0a02d364+PR+o.user+sp_ordr+o.user)

Notice 08-06-12 13:20:17.089485 2384 1 sp_ordr (for o.user queue o.user) data route to a02d364.48.7e failed err=100

Error 08-06-12 13:20:17.089485 2384 1 sp_ordr (for o.user queue o.user) 11004 - sp_ordr failure writing to queue(s)

Notice 08-06-12 13:20:17.089485 2384 1 sp_ordr (for o.user queue o.user) Exit sp_ordr to retry rim-write.

Info 08-06-12 13:20:17.089485 2384 1 Process exited sp_ordr (for o.user queue o.user) [pid = 8183] - exit(1)

The queues are corrupted

If the system that hosts SharePlex fails, one or more SharePlex queues can become corrupted, and there will be errors reading from and writing to them. In that case, the purge config and abort config commands cannot be used, because they rely on the queues to operate.

Solution: Contact SharePlex Support to resolve queue corruption issues.

The post queue seems too large

If the size of the post queue seems too large relative to the number of messages it contains, that is not unusual. The SharePlex post queue actually consists of a number of sub-queues, each approximately corresponding to a user session on the source system. A sub-queue can have one or more files associated with it, each with a default size of 8 MB. If the entire 8 MB of size is not used, the file remains on the system even after the data has been posted and read-released.

Solution: The size for the post queue in a status display is the actual disk space used by all of the queue files. You can eliminate obsolete files that have been read-released by using the trim command in the SharePlex qview utility. Thecommand preserves files containing data not yet posted and committed to the target database. See the SharePlex Reference Guide for more information about the qview utility.

Solve synchronization problems

This section reviews the causes and solutions for common synchronization problems. If you try these solutions and are still having problems, contact Quest Support.

For more information, see Understand the concept of synchronization.

If the issue you are experiencing is not listed in this documentation, search the SharePlex Knowledge Base at:

https://support.quest.com.

The Knowledge Base provides filtering options and links to other resources that can help you use and troubleshoot SharePlex.

How SharePlex reports out-of-sync conditions

For all objects except those involved in transformation, SharePlex verifies that the source and target data in a given operation are synchronized before posting the replicated data to the target. SharePlex does not verify synchronization if transformation is being used because:

  • Transformation changes the target data, so before and after images cannot be compared.
  • The transformation routine posts the data, not SharePlex.

When SharePlex determines that source and target data are different, it generates error conditions but continues to post other data from the post queue. To direct Post to stop processing altogether when it detects an out-of-sync condition, change the SP_OPO_OUT_OF_SYNC_SUSPEND (Oracle) or SP_OPX_OUT_OF_SYNC_SUSPEND (Open Target) parameter. See the Post parameter documentation in the SharePlex Reference Guide.

When an out-of-sync condition occurs, the Post process logs a message in the Status Database and also to the Event Log. To view these files in sp_ctrl:

  • Status Database: Use the show statusdb or show sync command.
  • Event Log: Use the show log command.

Use these commands frequently to monitor for out-of-sync errors.

The following is an example of how SharePlex reports an out-of-sync condition.

sp_ctrl (irvspxu14:8567)> show sync

Out Of Sync Status Database irvspxu14

Count Details

----- --------------------

3 Table "SCOTT"."TG_TEST1" out of sync for queue irvspxu14 since 16-Jun-

08 17:06:33

3 Table "SCOTT"."TG_TEST2" out of sync for queue irvspxu14 since 17-Jun-

08 15:47:58

1 Table "SCOTT"."TG_TEST3" out of sync for queue irvspxu14 since 17-Jun-

08 15:52:03

When data goes out of synchronization, SharePlex logs the failed SQL statements to the database_ID_errlog.sql file in the data sub-directory of the SharePlex variable-data directory.

Important: If you see an out-of-sync message in the Status Database and in the Event Log, but there is no record in the database_ID_errlog.sql file for the transaction, do not ignore those messages. They could be associated with a ROLLBACK. Regardless of whether or not a transaction is rolled back, SharePlex still compares the pre-images of the source and target rows. If they are different, that indicates that the data is out of synchronization. Only when a transaction is committed on the source but fails on the target does SharePlex log it to the database_ID_errlog.sql file, to give you a record of the statement that should have been applied as a tool for problem solving and for manually applying the statement if appropriate. Rolled back statements are canceled operations, and therefore not logged on the target.

Detect false out-of-sync conditions

Sometimes an out-of-sync message can be false, and the data is not out-of-sync. You can compare the source and target data by using the compare command, or you can perform the following comparison manually.

To compare with the compare command

See the compare commands in the SharePlex Reference Guide.

To verify that data is out-of-sync

  1. Get the rowid of the affected row from the database_ID_errlog.sql file in the variable-data directory on the target system.
  2. Using the rowid in the WHERE clause, run a SELECT query on the source table to get the row data for this rowid.
  3. Run a SELECT query on the target table, using the row data that you got from the source query in the WHERE clause.
  4. Compare the query results. If they are different, the rows are out of synchronization. If they are the same, the rows are synchronized.

Common out-of-sync conditions and solutions

The following are common ways in which data goes out of synchronization. In most cases, SharePlex detects out-of-sync conditions and returns an error message, but there are some situations where an out-of-sync condition is hidden, and SharePlex will not return an error.

Out of sync condition Description Solution

Incorrect cleanup procedure

If one of the database_cleansp cleanup utilities was run on some, but not all, systems associated with an active configuration, SharePlex perceives an out-of-sync condition.

Determine whether or not the cleanup utility was run on all systems by viewing the Event Log. The log tells you if and when it was run on each system. It also tells you if and when the configuration was activated. You can compare the times for those events to determine what happened.

If the cleanup was not completed on all replication systems for this configuration, run the cleanup utility on all systems. Because the cleanup removes replication queues and processes and deactivates the configuration, you must perform initial synchronization again.

DDL changes

Some DDL-related causes of out-of-sync conditions are:

  • Non-replicated DDL changes are made to source objects, but the configuration was not reactivated so that the objects can be re-analyzed.
  • DDL that SharePlex replicates also gets performed manually on the target.

For a list of supported DDL, refer to the SharePlex Release Notes.

To undo duplicate DDL changes made manually and also by SharePlex:

  1. Stop the Post process (it might already be stopped).

    sp_ctrl(sysB)> stop post

  2. Alter the target table to undo the DDL changes.
  3. Start Post and let SharePlex post the replicated DDL (which is still in the post queue).

    sp_ctrl(sysB)> start post

 

DML changes made directly to the target If applications or users make DML changes to the tables on the target, the results of those changes will cause hidden out-of-sync conditions until Post attempts to apply a replicated change to the affected rows. When the change is applied, SharePlex returns an out-of-sync error.

Prevent all DML access to the target tables that are in replication.

You can use the compare and repair commands to compare tables for out-of-sync rows and then repair those rows. For more information, see the command documentation in the SharePlex Reference Guide.

Triggers on target objects

Triggers must be disabled on target objects. The triggers fire on the source system and SharePlex replicates their effects to the target.

If triggers have fired on the target system, the objects changed by the triggers are out of synchronization and must be resynchronized. For more information, see How to resynchronize source and target tables.

To disable the effects of triggers on an Oracle target

To disable the effects of triggers on target Oracle objects after the data is resynchronized, you can either of the following:

  • Run the sp_add_trigger.sql script, which directs triggers to ignore the SharePlex Oracle user. See the utilities documentation in the SharePlex Reference Guide for more information about the trigger scripts.

  • Disable the triggers if they are not needed.
Unnecessary constraints

The only constraints that are necessary on target tables in a one-way replication configuration are primary and unique key constraints. CHECK constraints are not necessary on the target because they are satisfied on the source. FOREIGN KEY and ON DELETE CASCADE constraints are also satisfied on the source, and SharePlex replicates the child operations to the correct tables on the target.

For an Oracle target, you can leave ON DELETE CASCADE constraints enabled if you configure SharePlex to ignore them.

See Set up Oracle database objects for replicationSet up Oracle database objects for replication

Duplicate entries in the configuration file. Duplicate entries, where the source, target, and routing map are identical, cause double posting on the target.

Copy the configuration file to a new file.

Find and remove any duplicates in the new file.

Perform a resynchronization and reactivation. For more information, see How to resynchronize source and target tables.

Lack of disk space

Data goes out of synchronization if user transactions continue when SharePlex does not have enough room to accommodate them in the queues. This can happen if:

  • The network or target system was unavailable and too much data accumulated in the export queue.
  • The target was unavailable and too much data accumulated in the Post queue.
  • Capture lost pace with the logging of source transactions. In this case, data accumulates in the capture queue.
  • A SharePlex process was stopped, but not restarted.
  • The flush command was issued, but Post was not started again.
For more information, see How to resolve disk space shortage.

Changes to column conditions in horizontally partitioned replication

Out-of-sync conditions can result from the use of horizontally partitioned replication in the following cases:

  • A column condition value is updated and the new value no longer satisfies the row selection criteria.
  • A row that does not satisfy the column condition is updated to satisfy the condition.

Create column conditions so that partition shift does not occur. For more information, see Configure horizontally partitioned replication.

Not reactivating after a configuration change.

If a table was added to the configuration, but the configuration was not reactivated, operations on that table are not being replicated.

Note: For Oracle source tables, the auto-add feature is enabled by default, and any new table whose name satisfies a wildcard in the configuration file gets automatically added to replication. For more information, see Control Oracle DDL replication.

Resynchronize the affected tables, then reactivate the configuration so that SharePlex can update its object cache. For more information, see How to resynchronize source and target tables.

Queue corruption

If the SharePlex queues are corrupted, such as if there is a system failure, the data in them can be lost. This requires a resynchronization.

For more information, see How to resynchronize source and target tables.

(Oracle) To avoid queue corruption during system failure, you can use the parameter SP_QUE_SYNC. See the Queue parameters documentation in the SharePlex Reference Guide.

Oracle-related out-of-sync conditions and solutions

The following are common synchronization issues that relate specifically to replication between Oracle databases. In most cases, SharePlex detects out-of-sync conditions and returns an error message, but there are some situations where an out-of-sync condition is hidden, and SharePlex will not return an error.

Out of sync condition Description Solution
DDL changes

Some DDL-related causes of out-of-sync conditions are:

  • Non-replicated DDL changes are made to source objects, but the configuration was not reactivated so that the objects can be re-analyzed.
  • DDL that SharePlex replicates also gets performed manually on the target.

For a list of supported DDL, refer to the SharePlex Release Notes.

To undo duplicate DDL changes made manually and also by SharePlex:

  1. Stop the Post process (it might already be stopped).

    sp_ctrl(sysB)> stop post

  2. Alter the target table to undo the DDL changes.
  3. Start Post and let SharePlex post the replicated DDL (which is still in the post queue).

    sp_ctrl(sysB)> start post

 

Inadequate or non-existing conflict resolution routines

Conflict resolution procedures are required in a peer to peer (active-active) configuration. SharePlex uses the conflict resolution procedures to determine which operation to post when the same data change is received from different systems.

Revise and test the conflict resolution procedures, then resynchronize the data. For more information, see How to resynchronize source and target tables.
Log wrap

If the redo logs wrap before Capture can process the data it needs, the data can go out of synchronization if archived logging is not enabled or if the archive logs needed by Capture were removed. (Normally, Capture would access the archive logs and continue replicating.)

  • See Correct the problem first
  • If archiving is not enabled, there are no archive logs for SharePlex to read. Data lost after the log wrap cannot be recovered. Enable archived logging, and resynchronize the data. For more information, see How to resynchronize source and target tables.
  • If SharePlex is many logs behind Oracle, consider resynchronizing the data instead of restoring the logs. It might take less time than Capture would take to process the missing records from the archive logs. In addition, it eliminates the possibility that the capture queue could exceed free disk space while processing the archive logs. You can base your decision on the size of the redo logs and the number of tables being replicated, both of which determine how much information Capture must process. Also take into consideration how much latency the users of the target data can tolerate.
  • If archive logs are available, copy the appropriate logs back to the archived-log directory on the source system, or use the SP_OCT_ARCH_LOC parameter to point SharePlex to their location.
LONG columns

If a table has no primary or unique key, SharePlex builds a simulated key based on all of the columns except the LONG and LOB columns. If the LONG columns in the target rows are the only columns that contain unique values, multiple rows could meet the criteria for the simulated key. SharePlex could apply the UPDATE or DELETE to the wrong row without the error being detected, causing the table to go out of synchronization without an error message.

If you can create a key from columns that ensure uniqueness on the target table(s), you can avoid this kind of out-of-sync condition. After you create the key, resynchronize and re-activate those objects so that SharePlex can update its object cache.

If adding a primary or unique key is not possible (such as when packaged applications are in use), uniqueness of rows on the target system cannot be ensured.

Changes to keys

If tables use automatically generated numerical sequences as keys, and a key value changes, this may cause duplicity on the target system. If the new value already exists as a key in another row on the target system, SharePlex returns a unique-key constraint violation and out-of-sync error. This can happen when you update values using an x +n formula, where n is an incremental increase. It is possible that one of the x +n values will equal an existing value.

Create the sequences so that the updates cannot result in a duplication of keys on the target system.
DBMS_SCHEDULER procedures running on source and/or target system The effects of these procedures, such as objects being created, manipulated and dropped outside replication, may not be visible to replication or may not be supported, resulting in data changes that cause out-of-sync conditions. Exclude source and target objects from the jobs.
Virtual private database If replicating data that is configured as a virtual private database, the SharePlex database user may not have the access rights to capture the data. Any changes to that data will not be reflected on the target.

If you do not need that data replicated to the target, you can filter it out of the SharePlex configuration by means of partitioned replication.

If you want the data to be replicated, assign the SharePlex user the correct access rights.

PK/UK logging not enabled Certain replication problems can be prevented by logging key values. SharePlex fetches key values based on the rowid. Any operation that changes the rowid, such as ALTER TABLE...MOVE, can cause the wrong key values to be used for subsequent DML operations. SharePlex recommends that both primary key and unique key supplemental logging be set, or that a supplemental log group on unique columns be defined for every table in replication.

Correct the problem first

If data is out-of-synchronization, do the following before you resynchronize the data:

  1. Determine why it happened before you resynchronize the data. Otherwise, the problem can repeat itself and result in more data going out of synchronization.
  2. Stop the Post process to prevent further errors. If the accumulation of messages in the Post queue is threatening to cause disk space issues, and if there is enough disk space available on the source system, you can stop Import until Post can clear out some of the operations from other transactions. For more information, see How to resolve disk space shortage.
  3. View the Status Database and the Event Log to determine the cause of the problem.
  4. Resolve the problem.

Resynchronize data

For more information, see How to resynchronize source and target tables.

Documentos relacionados

The document was helpful.

Seleccionar calificación

I easily found the information I needed.

Seleccionar calificación