This section reviews common problems that you could encounter while data is being replicated.
If the issue you are experiencing is not listed in this documentation, search the SharePlex Knowledge Base at:
The Knowledge Base provides filtering options and links to other resources that can help you use and troubleshoot SharePlex.
General issues
Problem | Description | Solution |
---|---|---|
Network problems |
Network factors can cause data to transfer to a target system too slowly, or not at all. |
Make certain that all Export and Import processes are running, and ask the network administrator to check the network for causes of slowdowns. |
Parameter settings |
Some SharePlex parameters, when changed from their default settings, can inhibit replication performance. |
To determine if any parameter settings are responsible for reduced performance, issue the list param modified command to view parameters that have non-default settings. Review the documentation for those parameters in the SharePlex Reference Guide to determine whether a particular setting is affecting replication speed. |
Oracle Capture-related issues
If Capture is stopped, issue the status command in sp_ctrl to verify whether or not it was stopped intentionally by an authorized SharePlex user. If it was, find out why and make certain that it is not stopped for too long.
The following are other reasons why Capture could slow down or stop unexpectedly.
Problem | Description | Solution |
---|---|---|
Excessive chaining |
Excessive chaining reduces the performance of the source database because Oracle must read multiple blocks for a single row. Excessive chaining reduces Post performance on the target system because block fragmentation in the database decreases the speed at which Post can apply SQL statements. |
Reorganize chained tables. |
Ineffective redo log configuration |
Capture reads archive logs if the redo logs wrap before Capture is finished with them, but this can slow down replication and consume disk space. In most cases, the redo logs should be configured so that SharePlex can avoid using the archive logs. |
For more information about how to set up the logs, see the SharePlex Installation and Setup Guide. |
SharePlex has low process priority |
If Capture, Read, or both slow down during hot backups or other Oracle-intensive operations, view the process-priority settings on the system to determine if SharePlex can be assigned more resources. |
Ideally, the SharePlex priority should match Oracle's priority. |
Keys are not being logged |
If PK/UK logging is not enabled for the tables in replication, SharePlex must query the database to get the key values for update and delete operations. |
Enable PK/UK logging. |
Temporary tables and FND tables | Temp tables receive numerous changes that cause replication overhead and performance degradation. | Remove temporary tables from replication. |
archive logs not available |
When Capture detects a log wrap and the archive logs are not available, Capture stops and returns a “Log wrap detected” error. It waits a certain amount of time and then starts again, continuing this process until the logs are restored. |
Restore the archive logs from the one that Capture last processed or point SharePlex to their location with the SP_OCT_ARCH_LOC parameter. To determine the log that Capture needs, use the show capture command with the detail option. For more information, see How to restore Oracle archive logs . |
Compressed archive logs |
Capture cannot read compressed archive logs. |
Uncompress the current log that Capture needs and all those after it so that Capture can process them. To determine the log that Capture needs, use the show capture command with the detail option. |
archive logs in unexpected location |
Capture stops if it cannot locate the archive logs it needs. When the redo logs wrap, Capture looks for the archives in Oracle’s archive-log list, and if the logs are not there, Capture looks in the location specified by the SP_OCT_ARCH_LOC parameter. |
If you are storing the archive logs in a location other than the normal Oracle location, make sure this parameter is set to the full path name of the directory containing the archive logs. |
No access to the redo logs |
Capture stops when it cannot read or find the logs, then it tries to read the log again. |
Find out if someone changed permissions such that SharePlex does not have permission to read the redo logs. |
Capture stops if it is unable to write to the SharePlex logs because of restrictive permissions on the state sub-directory in the variable-data directory, or if there are space restrictions in that directory. |
Verify the permissions and space for the variable-data directory. |
Oracle Post-related issues
There are a number of things that cause the Post process to slow down. SharePlex generates the operation taking too long message when Post exceeds the internally controlled allowable wait time to apply a SQL statement to the target instance. Often it only takes one table to cause a bottleneck. Use the show sql or show post command to find out which table Post is processing, and then check for the following.
Problem | Description | Solution |
---|---|---|
Full table scans |
If a target table does not have a key, Post must perform a full table scan to find the correct row. This slows the Post process. |
Do one of the following:
|
Bitmap indexes on target tables |
Bitmap indexes are beneficial for queries, but they slow down the DML operations applied by Post. |
Avoid using bitmap indexes on target tables in replication, or only use them on tables that do not have frequent DML activity. |
Disk I/O bottleneck |
Disk I/O bottlenecks on the target system are a common cause of slow Post performance. Post can spend a high percentage of its time waiting for Oracle to commit data. The effect is worse when there is a Capture process reading from the same log device in peer-to-peer replication. |
Disk I/O is the nature of the database environment, but you can reduce the bottleneck by placing the redo logs on faster hard drives or on a solid-state drive. |
High number of buffer gets |
Review any table with SQL statements that cause a high number of buffer gets. There should only be two to four buffer gets on an index, depending on its size and whether or not it is a unique index. |
If there are more buffer gets than four, the index probably needs to be rebuilt. You can rebuild an index without the need to reactivate the configuration file. |
Oracle write-rate bottlenecks |
By default, one buffer writer writes all dirty Oracle blocks to disk. Whenever that buffer writer process wakes up to write, it locks portions of shared memory and, in effect, blocks the processes that are either modifying or reading data blocks — including the Post process. |
If this is a constant problem, you can consider increasing the number of writers. See the database documentation. |
Small transaction size |
Normally, Post performs an internal read/release after it receives each COMMIT, which means it purges that data from the queue as part of the checkpoint recovery process. For smaller transactions, this can cause excessive I/O on the target system and hinder the Post process. |
If most of your transactions are small, try changing the value of the Post SP_OPO_READRELEASE_INTERVAL parameter. See the SharePlex Reference Guide for more information about this parameter. |
Sequences not cached |
If the sequences in replication are not cached, they add unnecessarily to the replication volume. |
Cache sequences. If replicated sequences are part of a key, replicate the tables that contain those keys, and remove the sequences from the replication configuration. You should see significant performance improvement. |
Low ulimit |
An error similar to the following means that the system file descriptors setting on the target system needs to be increased: Error 07-24-08 12:11:40.360226 8693 12345 Poster error: /var/quest/ vardir/log/event_log: Too many open files (posting from ora102, queue prodsys, to ora10b) |
Solution: Set the ulimit as close to the optimal value of 1024 as possible. The ulimit can be set either as a system hard limit or a session-based soft limit, as follows:
|
Compare process locks |
The repair command locks all rows that need repair, or the entire table if there are more than 1000 out-of-sync rows, throughout the repair process. This could block Post if Post tries to apply data to the table being repaired. |
If you do not want Post to wait for a repair process to finish, you can kill the compare process. To avoid issues with locks caused by repair processes, consider running the repairs during non-peak hours. |
Difference in Capture and Post speed |
SharePlex reads and processes records from the redo logs faster than it can post those operations to the target database with standard SQL statements. |
|
Full archive log directory |
If Post appears stalled and will not shut down normally, but the only error in the Event Log is sp_opst_mt - operation taking too long, it could be that the archived log directory on the target system is full. If so, then Oracle cannot create new logs, and it suspends processing. Post stalls because it is waiting for Oracle. |
Move some of the old archive logs to another device, or delete them to make room for new ones. |
Commit reduction issues
If commit reduction is not working, it may be because records are not being dispatched quickly enough for there to be a valid message available after the commit. If a valid message is found after the commit message, then Post can skip the commit. Otherwise it issues the commit.
One way to find out if this is the issue is to stop the Post process completely, run a large amount of small transactions through replication, then start the Post process. If you do not see any commit reduction, then this would rule out this issue as the cause.
Post stopped
If Post stops, issue the status command in sp_ctrl to find out why.
• An idle status means there is no data in the post queue to post.
• A stopped by user status means that a SharePlex user stopped the Post process. To find out which user is responsible, view the user issued commands in the Event Log.
• A stopped due to error status means a replication or database error caused Post to stop.
The following are some potential causes for Post to stop unexpectedly.
Problem | Description | Solution |
---|---|---|
Correctable database errors |
Post stops for database errors that can be corrected, so that you can fix the problem without risk of the data going out of synchronization. The errors and the faulty SQL statement are logged to the databaseID_errlog.sql file in the log sub-directory of the variable-data directory on the target system. |
Use the information in the log file to correct the problem, then start Post again. Posting resumes from the point where it stopped. |
Non-correctable database errors |
Some database errors, such as out-of-sync conditions, cannot be corrected. In that case, SharePlex reports the error to the Event Log, writes the error and SQL statement to the databaseID_errlog.sql file, and continues processing. Sometimes, errors that cannot be corrected cause Post to stop. |
If the errors cannot be corrected but you want Post to continue processing, list the errors in the in the databasemsglist file and then set the SP_OPO_CONT_ON_ERR or SP_OPX_CONT_ON_ERR parameter to 1, which directs SharePlex to ignore those errors and continue posting. For more information, see Continue to post when there is a DML error. |
Locks on target tables |
If a target table is locked, the Post process cannot apply a SQL statement and generates an error message. This message could mean that a user, application, or job is accessing the table and might have caused an out-of-sync condition. Or, for an Oracle target, it could mean that a repair command has locked the table. |
Find out why the table was locked and resolve the access issue (unless due to the repair command). You might have to resynchronize the data if DML was performed on the table. For more information, see How to resynchronize source and target tables. |
Configuration deactivation when there are named export queues |
When you are using named export queues, and you deactivate the configuration, Post could stop with the following error instead of posting the remaining data from the queues: sp_opst_mt (for o.qa920-o.qa920 queue q5) 15007 - Can't open poster queue que_NOEXIST: Queue does not exist. |
To start Post and finish replication, shut down SharePlex, then start it again. |
Queue name is too long |
Post stops if a queue name is too long. |
Make certain the name assigned to named queues in the configuration file are no longer than 15 characters. |
Post generates a semaphore error | If Post returns an error message of "shs_SEMERR: an error occurred with the semaphore" on a Windows system, SharePlex requires more semaphores than are available. This usually occurs when named queues or horizontally partitioned replication are being used. You can increase the number of semaphores through the MKS Toolkit. |
|
(Oracle) New tables are not added |
For an Oracle source, tables created after activation are automatically added to replication. |
The auto-add feature is controlled by parameters. Make certain the correct ones are set to achieve your goals. For more information, see Control Oracle DDL replication. |
(Oracle) No more open cursors |
If you see the following error, Post has exceeded the available open cursors. Warning: sp_opst_mt (for o.oracle-o.oracle queue oracle) Post has opened number cursors. No more available cursors! Exiting Post requires a certain number of open cursors to the target database. |
View the current database OPEN_CURSORS value using the following SQL statement: select value from V$PARAMETER where name = 'open_cursors' ; To determine an adequate OPEN_CURSORS value for SharePlex, see Adjust open cursors. |
Other problems and solutions
Problem | Description | Solution |
---|---|---|
SharePlex cannot resolve a machine name | Sometimes machine names cannot be resolved between Unix or Linux and Windows systems. | Add the IP addresses and names of all servers (Unix, Linux, and Windows) in the replication network to the /etc/hosts file on all Unix and Linux machines, and add the same information to the hosts file on all Windows machines. |
sp_cop is using too much CPU time | SharePlex may be performing its overhead activities too frequently. |
Increase the idle time of sp_cop with the SP_COP_IDLETIME parameter. See the parameter documentation in the SharePlex Reference Guide. |
Corrupted source table |
If a source table is corrupted or there is another reason that you do not want the replicated data to be posted to the target database, you can prevent posting for that table without removing it from the active configuration or affecting posting for other objects. |
To disable posting for one or more tables, use the SP_OPO_DISABLE_OBJECT_NUM or SP_OPX_DISABLE_OBJECT_NUM parameter, which can be set to disable posting of both DML and DDL operations for specified object IDs. This parameter is disabled by default. When you are ready to begin posting to the target table again, disable the parameter again by resetting it to 0. For more information, see the parameter documentation in the SharePlex Reference Guide. |
Only some columns were replicated | If you have a table that is configured for vertically partitioned replication but that table name also satisfies a wildcard, the specific listing (with the vertical partitioning) takes precedence over the wildcarded listing. Vertical partitioning cannot be used with full-table replication for the same table. | No action is needed if replication of the specified columns is desired. For more information, see Configure vertically partitioned replication. |
Conflict resolution generates compile errors | If you encounter compile problems with your conflict resolution routines, check whether any tables have the same names as their owners. A known issue in PL/SQL prevents the SharePlex conflict resolution logic from compiling the PL/SQL for tables whose names are the same as their owners. Oracle has stated that the issue will not be fixed. See Oracle TAR 2577886.996 for more information. | This issue does not affect replication; SharePlex replicates data for tables with identical owner and table names. |
Common replication errors
The following table explains many of the common error messages that you might experience during replication.
sp_cop error messages | Cause | Solution |
---|---|---|
sp_cop cannot setup; memory segment n in use. | Processes that access the queues were still running when you last shut down sp_cop. | Kill those processes. SharePlex processes start with sp_. When all have been killed, sp_cop should start. |
Error: sp_cop can't setup shared memory statistics capability - exiting Error: sp_cop(shs) Cannot delete previous memory segment 303. Please check to see if any SharePlex processes are running (ps -ef | grep sp_). If there are some processes running then kill them and restart sp_cop. | There are already one or more SharePlexsp_cop processes pointing to the same variable-data directory. | To run multiple sessions of SharePlex, you need to use separate variable-data directories for each one. For more information, see Run multiple instances of SharePlex.. |
Capture error messages | ||
Capture time limit (300 sec) exceeded. | Capture is not processing records, which could indicate a problem with the redo log. If, after a specific number of seconds, Capture cannot process a record, it stops, logs the record, and returns this message. | See Oracle Capture-related issues for possible causes. If you cannot determine the cause of the problem, call Quest Support before a log wrap occurs. |
(Oracle) Log wrap detected | The redo logs wrapped and Capture cannot locate the archive logs. | If archive logs are available, uncompress and restore them to the archive log directory. SharePlex looks for the archive logs first in the Oracle archive log list, then in the location specified by the SP_OCT_ARCH_LOC parameter. This parameter should always be set to the correct archived log directory. If you use compression for the archive logs, do not compress them until SharePlex is finished processing them. To determine the current log for SharePlex, issue the show capture command with the [detail] option in sp_ctrl on the source system. You can compress any logs that were generated before the current one. Note that this error also occurs when the archived log is corrupted. |
Post error messages | ||
operation taking too long. | It is taking more than the internally allowed time to apply a SQL statement to the target instance. | For more information, see Oracle Post-related issues. |
Rowid not found | SharePlex cannot locate the correct row to update in the target database. | Check for triggers, processes, or users that may have deleted the row on the target. For more information, see Solve synchronization problems. |
Database not available. | Post cannot log into the target database. | Verify that the database is running, and determine whether someone changed the password of the SharePlex database account. |
Oracle-related error messages | ||
Can’t access OBJ$Table | SharePlex cannot access the Data Dictionary, which it must be able to do in order to replicate. | Check the O7_DICTIONARY_ACCESSIBILITY Oracle tuning parameter and make sure it is set to TRUE (the default). |
Forward and backward operation counts do not match... | Messages may be out of sequence. | Find out if Oracle was shut down before SharePlex. This can cause SharePlex to return errors, and in rare cases, corrupt the queues. The proper procedure is to shut down SharePlex and then shut down Oracle. For help resolving this problem, contact Quest Support. |
Error: sp_opst_mt (for o.blues920-o.ora9 queue bluesky) 15033 - Failed to execute SQL on table: QA.T_DEST_1: ORA-00001: unique constraint (.) violated. | A unique constraint was violated on the source system. The change entered the redo log, but Oracle rolled it back. The rollback also entered the redo log. SharePlex detects the constraint violation on the target. | Ignore this message. The tables are still in synchronization because Oracle rolled back the offending operation. This is an unavoidable error because of the way Oracle handles the violation. |
SQL Cache error messages | ||
Warning: Too many concurrent transactions. Will disable the SQL Cache capability. | The SQL Cache size is set to 1, and more cursors are still needed. SharePlex disables the SQL Cache feature in this case. | No action is required if this is your intended configuration. For more information, see Tune SQL Caching. |
Warning: Running out of cursors. Number of cursors opened so far is number. Will attempt to decrease SQL Cache size. or... Notice: Shrinking SQL Cache size to number per session. |
Post detected that it will exceed its maximum number of cursors and is trying to decrease the SQL Cache size. | No action is required unless the value gets to 1 and there are still not enough cursors. For more information, see Tune SQL Caching. |
SQL Cache disabled. | The SQL Cache feature is disabled. | No action is required if this is your intended configuration. For more information, see Tune SQL Caching. |
Hints file error messages (Oracle only) | ||
15050 – hint file not found |
SharePlex looks for the hints.SID file whether you use it or not. The location of this file is the data sub-directory of the SharePlex variable-data directory. If this file gets moved or deleted, SharePlex returns this error message. |
To prevent this message, create a blank hints file in the data sub-directory of the variable-data directory. Name it hints.SID. |
15051 – missing column in the hint file (either table of index name) 15052 – syntax error for tablename 15053 – syntax error for indexname |
The hints file is not configured correctly. | For more information, see Use Oracle INDEX hints. |
15054 – source table’s object_id not found in object cache | The hints file includes a source table that is not in the active configuration. All tables in the hints file must be listed in the active configuration. | If this table is in the configuration, make certain that the owner name and table name are spelled in the hints file the same way as they are in the configuration. |
15055 – more than n valid entries were entered into the hints file | The hints file permits only as many table-index combinations as the value set by the SP_OPO_HINTS_LIMIT parameter. | Either remove some of the table-index combinations or increase the value of SP_OPO_HINTS_LIMIT. For more information about this parameter, see the Post parameters documentation in the SharePlex Reference Guide. |
15056 – error allocation memory for hints | This indicates a system-level memory problem; the hints file itself does not demand a significant amount of memory. | If you believe the system memory is sufficient, stop the Post process and start it again. If you are not using the hints file, you can ignore this error. |
17000 – error opening hint file | SharePlex cannot open the hints file. | Check the file for corruption. If the file is valid, make certain there is sufficient read permission for the Post process. For more information, see Use Oracle INDEX hints. |
Environment-related error messages | ||
sp_opst_mt: pid=num date/time src host/ sid=db01:N2PB /var/quest/vardir/log/ event_log: Too many open files | The system file descriptors setting is not 1024. |
Set the ulimit to 1024. The ulimit can be set either as a system hard limit or a session-based soft limit, as follows:
|
06/29/00 08:05 System call error: sp_ocap(que) (for o.QA11 queue o.QA11) No space left on device devname 06/29/00 08:05 Internal error: sp_ocap (for o.QA11 queue o.QA11) 10705 - writecommit failed que_BUFWRTERR: Error writing buffer to file 06/29/00 08:05 Process exited sp_ocap (for o.QA11 queue o.QA11) [pid = 8692] -exit(1) | SharePlex ran out of space for the queues on the disk. | For more information, see How to resolve disk space shortage. |
gethostbyname name failed - exiting |
The local hosts file is not configured properly. |
The host name is not specified correctly in the hosts file (/etc/hosts on Unix and Linux or hosts on Windows). If this system is not part of a cluster, make corrections to the name in the file. If this system is part of a cluster, the virtual IP address must be mapped to a host alias in the hosts file. For instructions on configuring SharePlex in a cluster, see the SharePlex Installation and Setup Guide. |
Other error messages | ||
Snapshot too old | The read-consistent view required by SharePlex is no longer available. | Increase the size of the rollback segment. |
Parameter paramname does not exist in the paramdefaults file. Using hard coded default value. Please make sure that your param-defaults file is the correct version. |
SharePlex cannot find a parameter that it needs to reference. You have an older version of the param-defaults file than the version of SharePlex that you are running. Someone might have updated the SharePlex binaries using a downloaded patch, and may not have installed the latest param-default file. |
Always check for an updated param-defaults file when you manually update SharePlex. |
Invalid DATE format detected in record with rowid=rowid, on obj object_id. See capture log for detail. | A user or application entered an invalid date value into the Oracle database that bypassed the database’s validity check. | Set the SP_OCT_DEF_ parameters to enable SharePlex to fix the format of dates and times if they are not caught by the database checks. See the SP_OCT_DEF parameters documentation in the SharePlex Reference Guide. |
shs_SHMERR: an error occurred with shared memory. | You ran the qview utility without shutting down SharePlex (sp_cop). | Shut down SharePlex and re-run qview. |