About source and target data
SharePlex replication uses the concepts of source and target.
- The source data is the primary data that is to be replicated. This data resides on the source system.
- The target data is a full or subset copy of the primary data.This data resides on the target system. The object of replication is to keep the source and target data synchronized, or in-sync. This means that the state of the source data is reflected accurately by the target data, adjusting for any transformation that is performed and for any time lag in the replication stream.
As changes are made to source data, SharePlex continuously replicates them to the target system. The target data can take the form of any of the SharePlex-supported target types: tables in a database, messages in a JMS queue or Kafka topic, or XML or SQL records in a file that can be consumed by other software programs.
You can replicate data from one source to many targets in a direct or cascading configuration, and you can replicate from many sources to one target.
About the SharePlex architecture
This topic explains the basic components of SharePlex replication.
Note: This topic explains the default configuration of SharePlex. You can customize the SharePlex configuration to add additional queues and processes for the purpose of isolating data streams or improving performance.
Queues store the replicated data as it is transported from the source system to the target system. Queues are part of a checkpoint recovery system that facilitates safe, asynchronous transport of data. Data travels through the queues in the sequence in which it was generated.
Data is not read-released (deleted) from one queue until it is written to the next one. Data accumulates in the queues on the source and target systems if the network, system, or database slows down or fails, or when a replication process stops. When the problem or outage is resolved, SharePlex resumes processing from the point where it stopped.
SharePlex replication uses the following queues:
- Capture Queue: The capture queue resides on the source system and stores captured data for further processing by SharePlex. There is one capture queue for each data source that is being replicated. A capture queue is identified by the datasource, for example o.oraA.
- Export Queue: The export queue resides on the source system. It holds data that has been processed by SharePlex and is ready for transport to the target system. By default, there is one export queue on a source system regardless of the number of active configurations or target systems. A default export queue is identified by the name of the source system on which it resides, for example, SysA. You can instruct SharePlex to create additional named export queues for more complex replication strategies. See .
- Post Queue: The post queue resides on the target system. It holds data that is ready for writing to the target. On each target system, there is one post queue for the replication stream between a datasource and its target. For example, if OraA and OraB are both replicating to OraC, there are two post queues. A default post queue is identified by the name of the source system plus the datasource and the target, for example SysA (o.oraA-o.oraB). You can instruct SharePlex to create additional named post queues for more complex replication strategies. See .
Note: All SharePlex queue files are created and maintained in the rim sub-directory of the SharePlex variable-data directory.
SharePlex replicates data through a series of replication processes that are started by the main SharePlex process, sp_cop.
- The Capture process: The Capture process reads the transaction logs or archive logs on the source system for changes to objects that are configured for replication by SharePlex. The Capture process writes the data to the capture queue, where it accumulates until the Read process is ready for it. There is a separate Capture process for each datasource being replicated, each functioning concurrently and independently. The Capture process is named sp_ocap (Oracle Capture).
- The Read process: The Read process operates on the source system to read data from the capture queue and add routing information to the data. After processing the data, the Read process sends it to the Export queue. The Read process is named sp_ordr (Oracle Reader).
- The Export process: The Export process operates on the source system to read data from the export queue and send it across the network to the target system. By default, a source system has one Export process for each of its target systems. For example, if there are two target systems, there are two Export processes. Each Export process controls transmission of all data that is being replicated from a source system to a target system. The Export process is the first part of the Export/Import transport pair, which moves data between systems over a TCP/IP network. The Export process is named sp_xport.
The Import process: The Import process is the second half of the Export/Import transport pair. It operates on the target system to receive data and build a post queue. There is one Import process for each Export process. For example, if there are two source systems (each with an Export process) replicating data to a target system, there are two Import processes on the target. The Import process is named sp_mport.
Note: It is possible to replicate data between Oracle instances on the same system. In this case the Export and Import processes are not created. The Read process places data directly into a post queue on that system.
- The Post process: The Post process operates on the target system to read the post queue and apply the replicated operations to the target. There is a Post process for each post queue on a target system. Multiple Post processes can operate simultaneously on a system. The Post process is sp_opst_mt when configured to post to an Oracle database and sp_xpst when configured to post to an Open Target target.
All communication and movement of data by SharePlex is handled by an internal messaging and transport system, using an asynchronous stream protocol with TCP/IP connections that is very efficient for large data transfers. This method ensures optimal performance, reliability and restart capabilities, while conserving communication bandwidth. SharePlex can replicate over any TCP/IP network, including WAN environments.
SharePlex processes recover from network failures. When the network is unavailable, data is queued on the source system. SharePlex detects when the network is available again and resumes transmission.
How SharePlex replicates data
SharePlex reads the blocks of the redo logs and replicates transactional changes that are made to objects that are specified in a configuration file. In the configuration file, you specify which data to replicate and the target to which it is applied. When a configuration is activated. SharePlex replicates only the changes made to the source data, providing a fast and reliable replication solution. For more information about configuration files, see
From the information that it has about an operation, SharePlex creates one or more messages that are sent from the Capture process to the Read process, and from the Read process to all of the other replication processes in the replication stream. A message can reflect a SQL operation or an internal SharePlex operation, but most of the time it is an INSERT, UPDATE, DELETE, COMMIT, TRUNCATE or a supported DDL operation.
Large operations like those on LONG or LOB columns can require more than one message because a message has a size limitation. Other operations, such as array inserts of small records, have the inverse effect: There could be one record for numerous operations. For example, an array insert of 70,000 rows might be accommodated with only 700 messages, depending on the data.
In general, unless you are replicating numerous changes to LONGs and BLOBs, you can assume that the number of messages shown in the status output for a process or queue approximately corresponds to the same number of SQL operations.
The Post process reads messages from the post queue and applies the replicated data changes to the target. In the case of a database target, Post constructs SQL statement to apply the data. In the case of non-database targets, Post outputs data records in the format required by the target, for example a JMS queue.
The following explains the default ways that SharePlex builds SQL statements on the target system.
- If the change is an INSERT, SharePlex uses all of the columns in the row to build an INSERT statement.
- If the change is a DELETE, SharePlex uses only the key to build a WHERE clause to locate the correct row. If a table lacks a key, SharePlex simulates one by using the values of all of the columns, except LONG and LOB columns. You can specify columns to use as a key when you create the configuration file.
- If the change is an UPDATE, SharePlex uses the key plus the values of the changed columns to build a WHERE clause to locate the correct row. Before applying changes to the database, the Post process compares a pre-image of the values of the source columns to the existing values of the target columns. The pre-image (also known as the before image) is the value of each changed column before the UPDATE. If the pre-image and the existing target values match, confirming a synchronized state, Post applies the changes. If not, then Post logs the operation to an error file and SharePlex returns an “out-of-sync” error.
- If the change is an UPDATE or DELETE statement that affects multiple rows on the source machine, SharePlex issues multiple statements on the target to complete the task. For example, an UPDATE tableA set name = ‘Lisa’ where rownum < 101 statement actually sends 100 UPDATE statements to the target, even though only one statement was issued on the source.
The sp_cop program coordinates the SharePlex replication processes: (Capture, Read, Export, Import, Post) and the SharePlex queues, and it initiates all of the other background processes that perform specific tasks. It also maintains communication with other systems in the replication network. In general, most SharePlex users have little interaction with sp_cop other than to start and stop it. Once started, sp_cop runs in the background.
- Only a SharePlex Administrator (member of the SharePlex admin group) can start or stop sp_cop.
- sp_cop must be started on all source and target systems involved in replication.
- Start sp_cop as soon as (or before) users access the data on the source system, so that all SharePlex processes are ready to start processing transactions. That way, Capture can keep pace with the changes that are made to the source data.
- If Capture loses pace with the rate at which source transactions are generated, the redo logs could wrap before Capture is finished reading them. Log wrap is when the online Oracle redo logs reach their maximum size and number and Oracle starts writing new logs over the existing ones. If SharePlex has not captured all of the changes before the logs wrap, it reads the archive logs, but by then there could be unacceptable latency between source and target data. In that case, it might be more practical to resynchronize the data.
For more information about the replication processes, see About the SharePlex architecture.
Use sp_ctrl to issue the commands that start, stop, configure, direct and monitor SharePlex activities. The sp_ctrl program interacts internally with the sp_cnc (command and control) process, which is the child process of sp_cop that executes the commands. Users do not interact with sp_cnc itself.