How to prevent or reduce chances of queue corruption? (4300927)

Chat now with support

Get Live Help
Complete Registration

Sign In

Request Pricing

Contact Sales

Return

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Title

How to prevent or reduce chances of queue corruption?
Description

Shareplex is designed to recover from many types of failures but the power failure or system panic can likely result in queue corruption. This may stall the replication till the corrupted portion of queues is removed or the queues are entirely removed. This will result in loss of data and consequent out of sync on target. The reason for corruption is that Shareplex queues are partly cached in memory for performance reasons. In the event of power failure, some messages may not be fully written to disk, resulting in queue corruption. The parameter SP_QUE_SYNC controls whether the queue data is written to disk at all times or is partly cached in memory. In case of former there will be no data loss but would result in overhead and slow posting. The following description applies to Shareplex 5.1 or higher. In earlier versions, the parameter can be set to 1 (on) or 0 (off).
Resolution

SP_QUE_SYNC instructs the queue module to verify that the writes of queue data to disk have been written to the disk media before returning. This is not the standard disk write model in UNIX. In the default OS disk I/O procedure, disk writes are written to an internal OS buffer cache and then are written to disk later. This action distributes the overhead of writing to disk so processes do not have to wait for the data to be written to disk. When the data blocks are written to the buffer cache, the applications accessing the data cannot distinguish the data written to the buffer cache from data written to disk. Unlike the Oracle COMMIT, all processes that have access to the file also have access to the data in the buffer cache. If a system crash should occur between the time the data blocks are written to the buffer cache and to the physical media, any data not written to the media is at risk for being lost upon the system recovery.

The SP_QUE_SYNC parameter is implemented as follows:

SP_QUE_SYNC=0
The OS default case described in the second paragraph is the disk write algorithm used for the SharePlex queue data.

SP_QUE_SYNC=1

Setting SP_QUE_SYNC to 1 causes the O_SYNC flag to be set upon opening each queue data and header file. This flag tells the OS not to return a write call until the data has been successfully written to disk. Without the sync flag ("normal" I/O), space would be allocated for the file data but the file data might not be written due to a system crash, cluster failover, or other critical problem that causes the OS to stop executing.

SP_QUE_SYNC=2 (SharePlex default setting)

Setting the SP_QUE_SYNC parameter to 2 does not turn on the O_SYNC flag. Instead, normal writes are done until a queue write COMMIT is called. As part of the write COMMIT, the queue module executes the system call fsync on each queue data file and then on the queue header files. This eliminates redundant sync operations on data that may be rewritten later by a later write COMMIT.

Impact of setting SP_QUE_SYNC

Setting SP_QUE_SYNC to a value that causes disk writes to complete before returning might have an impact on SharePlex performance and may slightly increase I/O processing for non-SharePlex processes. The amount of overhead is dependent upon the amount of data in the queue, the filesystem types, and the types of disk drive and disk controller in use.

To set the parameter:

The parameter can be set either at sp_ctrl level or set in environment or paramdb file. Shareplex needs to be bounced for the parameter to take effect. In the following example, it is set at sp_ctrl level:

sp_ctrl>set param SP_QUE_SYNC 1
sp_ctrl>shutdown

When Shareplex is restarted, the parameter will take effect. One thing to note is that setting the parameter to 1 may reduce the chances of queue corruption considerably, it is not 100% guaranteed.

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Request a KB Article

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

How to prevent or reduce chances of queue corruption? (4300927)

Title

Description

Resolution

Leave a Comment