The SharePlex Post process can connect and write to a Kafka broker. The data can be written in JSON or XML output as a sequential series of operations as they occurred on the source, which can then be consumed by a Kafka consumer.
These instructions contain setup instructions that are specific to this target. Install SharePlex on the source and target according to the appropriate directions in this manual before performing these setup steps.
For the versions, data types and operations that are supported when using SharePlex to replicate to this target, see the SharePlex Release Notes.
When replicating data to Kafka, configure the source database and SharePlex on the source system as follows.
On the source system, enable PK/UK supplemental logging in the Oracle source database. SharePlex must have the Oracle key information to build an appropriate key on the target.
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY, UNIQUE) COLUMNS;
On the source system, set the SP_OCT_USE_SUPP_KEYS parameter to a value of 1. This parameter directs SharePlex to use the columns set by Oracle's supplemental logging as the key columns when a row is updated or deleted. When both supplemental logging and this parameter are set, it ensures that SharePlex can always build a key and that the SharePlex key will match the Oracle key.
See the SharePlex Reference Guide for more information about this parameter.
On the source, create a SharePlex configuration file that specifies capture and routing information. The structure that is required in a configuration file varies, depending on your replication strategy, but this shows you the required syntax for routing data to Kafka.
Datasource:o.SID | ||
src_owner.table | !kafka[:tgt_owner.table] | host |
where:
host is the name of the target system.
Note: See Configure data replication.
Datasource:o.ora112
MY_SCHEMA.MY_TABLE !kafka sysprod
These instructions configure the SharePlex Post process to connect to Kafka. You must have a running Kafka broker.
To configure post to Kafka
Issue the target command to configure posting to a Kafka broker and topic. The following are example commands.
sp_ctrl> target x.kafkaset kafka broker=host1:9092,host2:9092,host3:9092
sp_ctrl> target x.kafka set kafka topic=shareplex
See View and change Kafka settings for command explanations and options.
Note: Specify more than one broker so that SharePlex will attempt to connect to the other brokers in the list if any one of them is down.
SharePlex can output to either XML or JSON format as input to Kafka. XML is the default. To set the input format and specify format options, use one of the following target commands:
target x.kafka set format record=json
or:
target x.kafka set format record=xml
To view samples of these formats, see the format category of the target command documentation in the SharePlex Reference Guide.
To view current property settings for output to Kafka, use the following target command:
target x.kafka show
To change a property setting, use the following command.
target x.kafka [queue queuename] set kafka property=value
where:
Table 3: Kafka target properties
Property | Input Value | Default |
---|---|---|
broker=broker |
Required. The host and port number of the Kafka broker, or a comma delimited list of multiple brokers. This list is the bootstrap into the Kafka cluster. So long as Post can connect to one of these brokers, it will discover any other brokers in the cluster. |
localhost:9092 |
client_id=ID |
Optional. A user-defined string that Post will send in each request to help trace calls. |
None |
compression.code={none, gzip, snappy} | Optional. Controls whether data is compressed in Kafka. Options are none, gzip or snappy. | None |
partition={number | rotate | rotate trans} |
Required. One of the following:
|
0 |
request.required.acks=value | Optional. This is a Kafka client parameter. By default it is set to a value of -1, which means all. Consult the Kafka documentation about this subject, because all really means all in-sync replicas. This parameter can be used in conjunction with the min.insync.replicas broker parameter to tune behavior between availability and data consistency. Important: It is possible for data to be lost between a Kafka producer (SharePlex in this case) and a Kafka cluster, depending on these settings. | -1 |
topic=topic_name |
Required. The name of the target Kafka topic. This string may contain the special sequences %o or %t. The %o sequence is replaced by the owner name of the table that is being replicated. The %t sequence is replaced by the table name of the table that is being replicated. This feature may be used in conjunction with a Kafka server setting of auto.create.topics.enabled set to 'true'. Also view your server settings for default.replication.factor and num.partitions because these are used as defaults when topics are auto created. Important! If using multiple topics, you must also set the following properties with the target command:
|
shareplex |
* To avoid latency, if Post detects no more incoming messages, it sends the packet to Kafka immediately without waiting for the threshold to be satisfied.
If the Kafka process aborts suddenly, or if the machine that it is running on aborts, row changes may be written twice to the Kafka topic. The consumer must manage this by detecting and discarding duplicates.
Every record of every row-change operation in a transaction has the same transaction ID and is also marked with a sequence ID. These attributes are id and msgIdx, respectively, under the txn element in the XML output (see Set up replication from Oracle to Kafka).
The transaction ID is the SCN at the time the transaction was committed, and the sequence ID is the index of the row change in the transaction. These two values are guaranteed to be the same if they are re-written to the Kafka topic in a recovery situation.
If desired, you can configure Post to include additional metadata with every row-change record by using the following command:
target x.kafka [queue queuename] set metadata property[, property]
Table 4: Optional metadata properties
Property | Description |
---|---|
time | The time the operation was applied on the source. |
userid | The ID of the database user that performed the operation. |
trans | The ID of the transaction that included the operation. |
size | The number of operations in the transaction. |
target x.kafka set metadata time, userid, trans, size
To reset the metadata
target x.kafka [queue queuename] reset metadata
To view the metadata
target x.kafka [queue queuename] show metadata
SharePlex can post replicated Oracle data to a file formatted as SQL or XML. This data is written as a sequential series of operations as they occurred on the source, which can then be posted in sequential order to a target database or consumed by an external process or program.
These instructions contain setup instructions that are specific to this target. Install SharePlex on the source and target according to the appropriate directions in this manual before performing these setup steps.
For the versions, data types and operations that are supported when using SharePlex to replicate to this target, see the SharePlex Release Notes.
On the source, create a SharePlex configuration file that specifies capture and routing information. The structure that is required in a configuration file varies, depending on your replication strategy, but this shows you the required syntax for routing data to a SQL or XML file.
Datasource:o.SID | ||
src_owner.table | !file[:tgt_owner.table] | host |
where:
src_owner.table is the owner and name of the source table.
Note: For more information, see Configure data replication in the SharePlex Administration Guide.
The following example replicates the parts table in schema PROD from Oracle instance ora112 to a file on target system sysprod.
Datasource:o.ora112
PROD.parts !file sysprod
By default, SharePlex formats data to a file in XML format, and there is no target setup required unless you want to change properties of the output file (see Set up replication from Oracle to a SQL or XML file.) To output in SQL format, use the target command to specify the SQL output as follows.
To output data in SQL format
Issue the following required target commands to output the records in SQL. Note: Use all lower-case characters.
target x.file [queue queuename] set format record=sql
target x.file [queuequeuename] set sql legacy=yes
where: queue queuename constrains the action of the command to the SharePlex Post process that is associated with the specified queue.
See Set up replication from Oracle to a SQL or XML file for descriptions of these settings and other optional properties that you can set.
To view samples of the SQL and XML formats, see the target command documentation in the SharePlex Reference Guide.
To view current property settings for output to a file, use the following command:
target x.file show
To change a setting, use the following target command.
target x.file [queue queuename] set [category] property=value
Post writes to a series of files. The active working file is prepended with the label of current_ and is stored in the opx/current subdirectory of the variable-data directory.
Output Format | Name of Current File |
---|---|
SQL |
current_legacy.sql |
XML |
current_prodsys.XML |
Important: Do not open or edit the current_ file.
Post uses the max_records, max_size and max_time parameters to determine the point at which to start a new active file. When this switch occurs, Post moves the processed data to a sequenced file in the opx subdirectory of the variable-data directory. The file names include the name of the post queue, the time and date, and an incrementing ID.
SQL files:
/installed/vardir> ls -1 opx
0000000000_20140305130858_legacy.sql
0000000001_20140305131130_legacy.sql
0000000002_20140305131212_legacy.sql
0000000003_20140305133835_legacy.sql
0000000004_20140305134028_legacy.sql
XML files:
/installed/vardir> ls -1 opx
0000000000_20140305130858_prodsys.XML
0000000001_20140305131130_prodsys.XML
0000000002_20140305131212_prodsys.XML
0000000003_20140305133835_prodsys.XML
0000000004_20140305134028_prodsys.XML
To force a file switch
The current file cannot be viewed or consumed without stopping Post. To access the data in the current file, you can use the target command with the switch option to move the data to a sequenced file, from which it can then be consumed or viewed. After issuing this command, the switch occurs after Post processes a new record.
target x.file [queue queuename] switch
Prework for the demonstrations
Create and activate a configuration
Before you run the basic demonstrations, have the following items available.
You will replicate splex.demo_src from the source system to splex.demo_dest on the target system. These tables are installed by default into the SharePlex schema, which in these demonstrations is "splex." Your SharePlex schema may be different. Verify that these tables exist.
Column Name | Data Type | Null? |
NAME | varchar2(30) | |
ADDRESS | verchar2(60) | |
PHONE | varchar2(12) |
|
© ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center