The following error messages are seen in the event_log and the messages keep accumulating in Capture queue as Read does not process them and Read remains idle:
Info 2019-05-25 00:14:37.036597 90375 471185248 Reader exited with code=1, pid = 91966 (from SOURCE_SID)
Error 2019-05-25 00:14:37.029578 91966 3149764448 Reader: EOF not expected. (from SOURCE_SID) [module hpp]
Error 2019-05-25 00:14:37.027077 91966 3149764448 Reader: ^ (from SOURCE_SID) [module hpp]
Error 2019-05-25 00:14:37.024364 91966 3149764448 Reader: (from SOURCE_SID) [module hpp]
Error 2019-05-25 00:14:37.021907 91966 3149764448 Reader: No such file or directory 11070 - partition cache read failed. Error in reading t_part (from SOURCE_SID) [module ord]
Notice 2019-05-25 00:14:37.017724 91966 3149764448 Reader: Oracle Available (from SOURCE_SID) [module ord]
Warning 2019-05-25 00:14:35.760346 91966 3149764448 Reader: Unable to convert queue message, look at reader log for details. Skipping message. (from SOURCE_SID) [module ord]
Warning 2019-05-25 00:14:35.757784 91966 3149764448 Reader: Suspicious magic number 1329877558. Look at reader log for record details (from SOURCE_SID) [module ord]
Warning 2019-05-25 00:14:35.755132 91966 3149764448 Reader: Unable to convert queue message, look at reader log for details. Skipping message. (from SOURCE_SID) [module ord]
Warning 2019-05-25 00:14:35.752445 91966 3149764448 Reader: Suspicious magic number 1329877558. Look at reader log for record details (from SOURCE_SID) [module ord]
There is a corruption in Capture queue and Read is not able to process messages.
This may happen because wrong shareplex version has been used.
Check event_log for previous and current used shareplex binary.
grep -i shareplex event_log
if it returns multiple versions of Shareplex, then make sure the correct version of shareplex is used.
For example, shareplex 9.x is version that generates the capture messages; however, shareplex was shutdown, and version 8.6.x is started in its place. As the result, version 8.6.x is not able to process capture messages in capture queue.
If correct version of shareplex is used, then run the following steps that should resolve the queue corruption:
1. shutdown Shareplex
2. Invoke the qview program located in the bin sub directory under product directory and run "fixup all" as below:
qview -i
qview> fixup all
3. restart Shareplex
Some of the points to note are:
1.The command is very simple and is not prone to any issues. If there is any syntax error, it will simply not run.
2. If the above command does not resolve queue corruption (which will show up in the form of Read error still occurring) one needs to contact Support to explore other options to resolve the issue.
3. It should be noted that in the process of resolving queue corruption, the command may get rid of corrupted messages, the magnitude of which can extend from minimum or no loss to complete data loss. Fixing queue corruption is necessary before Read can process any messages. So whether it involves a no data loss at one extreme, or complete data loss at another extreme, the steps need to be undertaken before replication can run normally.
4. If there is any data loss, it will result in out of sync on some of the target tables.