Shareplex is designed to recover from many types of failures but the power failure or system panic can likely result in queue corruption. This may stall the replication till the corrupted portion of queues is removed or the queues are entirely removed. This will result in loss of data and consequent out of sync on target. The reason for corruption is that Shareplex queues are partly cached in memory for performance reasons. In the event of power failure, some messages may not be fully written to disk, resulting in queue corruption. The parameter SP_QUE_SYNC controls whether the queue data is written to disk at all times or is partly cached in memory. In case of former there will be no data loss but would result in overhead and slow posting. The following description applies to Shareplex 5.1 or higher. In earlier versions, the parameter can be set to 1 (on) or 0 (off).
SP_QUE_SYNC instructs the queue module to verify that the writes of queue data to disk have been written to the disk media before returning. This is not the standard disk write model in UNIX. In the default OS disk I/O procedure, disk writes are written to an internal OS buffer cache and then are written to disk later. This action distributes the overhead of writing to disk so processes do not have to wait for the data to be written to disk. When the data blocks are written to the buffer cache, the applications accessing the data cannot distinguish the data written to the buffer cache from data written to disk. Unlike the Oracle COMMIT, all processes that have access to the file also have access to the data in the buffer cache. If a system crash should occur between the time the data blocks are written to the buffer cache and to the physical media, any data not written to the media is at risk for being lost upon the system recovery.
The SP_QUE_SYNC parameter is implemented as follows:
SP_QUE_SYNC=0
The OS default case described in the second paragraph is the disk write algorithm used for the SharePlex queue data.
SP_QUE_SYNC=1
Setting SP_QUE_SYNC to 1 causes the O_SYNC flag to be set upon opening each queue data and header file. This flag tells the OS not to return a write call until the data has been successfully written to disk. Without the sync flag ("normal" I/O), space would be allocated for the file data but the file data might not be written due to a system crash, cluster failover, or other critical problem that causes the OS to stop executing.
SP_QUE_SYNC=2 (SharePlex default setting)
Setting the SP_QUE_SYNC parameter to 2 does not turn on the O_SYNC flag. Instead, normal writes are done until a queue write COMMIT is called. As part of the write COMMIT, the queue module executes the system call fsync on each queue data file and then on the queue header files. This eliminates redundant sync operations on data that may be rewritten later by a later write COMMIT.
Impact of setting SP_QUE_SYNC
Setting SP_QUE_SYNC to a value that causes disk writes to complete before returning might have an impact on SharePlex performance and may slightly increase I/O processing for non-SharePlex processes. The amount of overhead is dependent upon the amount of data in the queue, the filesystem types, and the types of disk drive and disk controller in use.
To set the parameter:
The parameter can be set either at sp_ctrl level or set in environment or paramdb file. Shareplex needs to be bounced for the parameter to take effect. In the following example, it is set at sp_ctrl level:
sp_ctrl>set param SP_QUE_SYNC 1
sp_ctrl>shutdown
When Shareplex is restarted, the parameter will take effect. One thing to note is that setting the parameter to 1 may reduce the chances of queue corruption considerably, it is not 100% guaranteed.
© ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center