INTERNAL - Simplest way to detect and resolve queue corruption (4297815)

Return

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Title

INTERNAL - Simplest way to detect and resolve queue corruption
Description

Any of the following would indicate queue corruption:

Post queue corruption:

10/03/06 09:26 System call error: No such file or directory bu_rd.bu_fd [sp_opst(que)/27386]
10/03/06 09:26 Notice: reseq/getdata bad hdr magic=0 query_seq=509335035 seq=0 [sp_opst(que)/27387]
10/03/06 09:26 Notice: Error reading queue BIGTAB4+P+o.opdb-o.gwdb, subqueue 10 [sp_opst(que)/27387]
10/03/06 09:26 Error: 15009 - Can't rewind poster queue que_BAD_MSGHDR: Invalid message header detected [sp_opst (for o.opdb-

o.gwdb queue BIGTAB4)/27387]
10/03/06 09:26 Process exited sp_opst (for o.opdb-o.gwdb queue BIGTAB4) [pid = 27387] - exit(1)

Capture queue corruption:

07/27/06 17:41 Process exited sp_ordr (for o.CRW04) [pid = 23854] - exit(1)
07/27/06 17:41 Error: 11000 - sp_ordr failed: Can't read capture queue (que_ERR: Non specific error) [sp_ordr/23854]
07/27/06 17:41 Notice: Error reading queue o.CRW04+C, subqueue 0 [sp_ordr(que)/23854]
07/27/06 17:41 Notice: peekahead failure buffer hdr=0xef811268 addr=0xef8351c8 [sp_ordr(que)/23854]
07/27/06 17:41 Notice: sque_read: peekahead failure magic=0 seq=46036694312 qseq=0 wseq=63633696078 [sp_ordr(que)/23854]
07/27/06 17:41 Process launched: sp_ordr (for o.CRW04) [pid = 23854]

or

01/16/06 23:50 Error: Cannot initialize queues: An illegal queue message has been detected [sp_cop(que)/9348]
01/16/06 23:50 Notice: Error reading queue o.cduat+C, subqueue 0 [sp_cop(que)/9348]
01/16/06 23:50 Notice: reseq/getdata bad hdr magic=0 query_seq=34709753398 seq=0 [sp_cop(que)/9348]
01/16/06 20:06 Error: queue initialization failed - exiting [sp_cop/1327]
01/16/06 20:06 Notice: 1 total subqueue headers processed [sp_cop(que)/1327]

Export queue corruption:

05/04/05 21:36 Notice: sp_cop(que) reseq/getdata bad hdr magic=0 query_seq=3473125707seq=0
05/04/05 21:36 Notice: sp_co
Cause

Queue corruption.
Resolution

Try to resolve it with a simple procedure as listed below (Caution: do not use it with version Shareplex 5.1.1 through 5.1.5 as it can cause replication to fail, requiring complete clean up of replication environment and fresh sync and subsequent activation that requires database downtime)

1. shutdown Shareplex
2. Invoke the qview program located in the bin sub directory under product directory and run "fixup all" as follows:
qview -i
qview> fixup all
3. restart Shareplex
It also generates a log file in /vardir/log named qvalid.log and has details about what was encountered by way of queue corruption and fixed. The qvalid.log may not convey much to the user and may be of use to Support at times to troubleshoot further if required. Here is a sample run of "fixup all" and content of qvalid.log:
Queue corrupt - see report in /opt/splex/var/CRMRPGB/log/qvalid.log
VALIDATE splexgb+P+o.CRMPGB-o.CRMRPGB
Validate subqueue 65557
Validate subqueue 65556
.
.
.
Validate subqueue 65539
Validate subqueue 0
Queue corrupt - see report in /opt/splex/var/CRMRPGB/log/qvalid.log

Contents of qvalid.log:
VALIDATE ALL QUEUES, Thu Feb 5 13:37:55 2004
VALIDATE o.CRMRPGB+C
Sque 0: write commit is less than read release - 1195208384 < 1195241408
VALIDATE splexrepgb+X
VALIDATE splexgb+P+o.CRMPGB-o.CRMRPGB
Validate subqueue 65556 from 78203 to 81665
Corruption found in subqueue 65556
Starting at last read release point 78203, oldest sqmid 179
Queue data missing between (hex) offsets 0x1317b and 0x13f01
7 messages missing from subqueue
Data corruption extends to end of data 81665, last sqmid 187
Error 'No such file or directory' opening
/opt/splex/var/CRMRPGB/rim/splexgb+P+o.CRMPGB-o.CRMRPGB+65556.0000000
File for seq 0, fid 0, where write commit point is 81665, fid 0
Validate subqueue 65555 from 2466864 to 2617134
.
.
.

Some of the points to note are:

1. There are many other ways to resolve queue corruption which includes qview commands and other workaround but this one stands out for its ease of use.
2. The chances of error are not there since the command is very simple and if there is any syntax error, it will simply not run.
3. If the above command does not resolve queue corruption, one needs to contact Support to explore other avenues.
4. It should be noted that in the process of resolving queue corruption, the command may get rid of corrupted messages, the magnitude of which can extend from minimum or no loss to complete data loss. However, if there is a queue corruption, it needs to be resolved before the queue(s) can move forward.
5. The command has been in existence since Shareplex version 5.1.
Additional Information

If fixup all complains in the qvalid.log about "Error 'No such file or directory' opening /u01/oradata/var2600/rim/logvq_oce+P+o.dsrv-o.vbv1+1.0209104" here a possible workaround:
1. Touch the file manually
2. Re-run fixup -all
You might be forced to do this a few times with different filenames and after a while a reset queue might be easier.

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Request a KB Article

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

INTERNAL - Simplest way to detect and resolve queue corruption (4297815)

Title

Description

Cause

Resolution

Additional Information

Leave a Comment