SharePlex 11.4 - Installation and Setup guide

The SharePlex Post process can connect and write to a Kafka broker. The data can be written in JSON or XML output as a sequential series of operations as they occurred on the source, which can then be consumed by a Kafka consumer.

These instructions contain setup instructions that are specific to this target. Install SharePlex on the source and target according to the appropriate directions in this manual before performing these setup steps.

For the versions, data types and operations that are supported when using SharePlex to replicate to this target, see the SharePlex Release Notes.

Guidelines for posting to Kafka

A SharePlex Post process acts as a Kafka producer. A SharePlex Post process can write to one or more topics that have one or more partitions.
The SharePlex Post process does not create a topic itself, but you can configure the Kafka broker to auto-create topics.

Configure SharePlex on the source

You need to setup SharePlex and the database on the Oracle source system. For detailed setup steps, see Configure SharePlex on the source.

Configure SharePlex on the target

These instructions configure the SharePlex Post process to connect to Kafka. You must have a running Kafka broker.

To configure post to Kafka:

Create a Kafka topic.
Start sp_cop. (Do not activate the configuration yet.)
Run sp_ctrl.
Issue the target command to configure posting to a Kafka broker and topic. The following are example commands.

sp_ctrl> target x.kafkaset kafka broker=host1:9092,host2:9092,host3:9092

sp_ctrl> target x.kafka set kafka topic=shareplex

See View and change Kafka settings for command explanations and options.

Note: Specify more than one broker so that SharePlex will attempt to connect to the other brokers in the list if any one of them is down.

Set the Kafka record format

SharePlex can output to either XML or JSON format as input to Kafka. XML is the default. To set the input format and specify format options, use one of the following target commands:

target x.kafka set format record=json

or:

target x.kafka set format record=xml

To view samples of these formats, see the format category of the target command documentation in the SharePlex Reference Guide.

Note: When replicating data from Oracle to Kafka in JSON format, SharePlex does not support the varray data type or the varray type inside the SDO_GEOMETRY data type.

View and change Kafka settings

To view current property settings for output to Kafka, use the following target command:

target x.kafka show

To change a property setting, use the following command.

target x.kafka [queue queuename] set kafka property=value

where:

queue queuename is the name of a Post queue. Use this option if there are multiple Post processes.
property and value are shown in the following table.

Table 3: Kafka target properties

Property

Input Value

Default

broker=broker

Required. The host and port number of the Kafka broker, or a comma delimited list of multiple brokers. This list is the bootstrap into the Kafka cluster. So long as Post can connect to one of these brokers, it will discover any other brokers in the cluster.

localhost:9092

client_id=ID

Optional. A user-defined string that Post will send in each request to help trace calls.

None

compression.code={none, gzip, snappy}

Optional. Controls whether data is compressed in Kafka. Options are none, gzip or snappy.

None

partition={number | rotate | rotate trans| messagekey}

Required. One of the following:

A fixed partition number: Directs Post to post messages only to the specified partition number. For example, setting it to 0 directs Post to post only to partition 0. This option is suitable for use in testing or if the target has multiple channels of data posting to the same Kafka topic.
The keyword rotate: Directs Post to apply messages to all of the partitions of a topic in a round-robin fashion. The partition changes with each new message. For example if a topic has three partitions, the messages are posted to partitions 0,1,2,0,1,2, and so on in that order.
The keyword rotate trans: This is similar to the rotate option, except that the partition is incremented with each transaction rather than with each message. For example, if a topic has three partitions, the messages are posted to partition 0 until the commit, then to partition 1 until the commit, and so on in that order. This option is suitable if you are replicating multiple tables to a single topic. It allows you to distribute data across several partitions, while still preserving all of the operations of a transaction together in a single partition. This enables a consumer that reads from a single partition to receive a stream of complete transactions.
The keyword messagekey: Directs Post to post messages to partitions. The Kafka topics are divided into several partitions. These partitions are selected based on the default partition hash function. The hash value is calculated based on messagekey. Use the messagekey partition to place all messages with the same key values in the same partition.

Notes:

The LOB and CLOB columns are not considered Kafka partition keys.
For a table without a primary key, unique key, composite key, or unique index, all columns (except LOB and CLOB columns) will be considered key columns. When performing an alter query on such a table, the DDL statement will be replicated to all partitions, and subsequent DML statements will be sent to specific partitions based on the existing columns.
If the replication table has no key defined, SharePlex will consider all table columns as Kafka messagekey. For non-key tables, it is recommended to use SharePlex user-defined keys. For more information, see the Define a Unique Key: PostgreSQL to PostgreSQL section in the SharePlex Admin Guide.
In cases where multiple tables are involved in replication, if we want a specific table to have a different partition type, while the remaining tables are partitioned based on the messagekey, we can define a named post queue for those specific tables.

For example:

target x.kafka queue <queue_name> set kafka partition={number/rotate/rotate trans}

For the rest of the tables, use the below command:

target x.kafka set kafka partition=messagekey

Important:

When partitioning is based on the messagekey, messages that do not contain key information will be mapped according to Kafka's internal hash function. These messages may include commit, schema, rollback, savepoint, and DDL statements.

During replication, if the number of partitions is increased, the existing mapping of keys to partitions will no longer remain valid.

For tables with a few columns serving as indexes and no other constraints defined, use those indexes as unique keys in the SharePlex config file.

For example, the following table has a unique index defined on two columns: ID and NAME.

create table mytable(ID NUMBER(25,2),NAME CHAR(200),COL_VARCHAR2 VARCHAR2(400),COL_RAW RAW(1000));

CREATE INDEX indx_mytable ON mytable(ID,NAME);

In the SharePlex config file, define the index columns as a unique key.

datasource:o.SID
src.mytable	!key(ID,NAME)	host

For more information, see the Define a Unique Key: Oracle to Oracle section in the SharePlex Admin Guide.

For tables with no constraints or indexes defined, users can define unique keys during configuration in SharePlex.

For a table that has a composite key, if any of the key values are modified, the modification message will be placed in the current partition, and subsequent messages may or may not be assigned to the same partition.

request.required.acks=value

Optional. This is a Kafka client parameter. By default it is set to a value of -1, which means all. Consult the Kafka documentation about this subject, because all really means all in-sync replicas. This parameter can be used in conjunction with the min.insync.replicas broker parameter to tune behavior between availability and data consistency.

Important: It is possible for data to be lost between a Kafka producer (SharePlex in this case) and a Kafka cluster, depending on these settings.

-1

topic=topic_name

Required. The name of the target Kafka topic.

This string may contain the special sequences %o or %t. The %o sequence is replaced by the owner name of the table that is being replicated. The %t sequence is replaced by the table name of the table that is being replicated. This feature may be used in conjunction with a Kafka server setting of auto.create.topics.enabled set to 'true'. Also view your server settings for default.replication.factor and num.partitions because these are used as defaults when topics are auto created.

Important! If using multiple topics, you must also set the following properties with the target command:

The output must be in JSON. Set the record property of the format category to json:

target x.kafka set format record=json
Commits must be disabled. Set the commit property of the json category to no:

target x.kafka set json commit=no

shareplex

* To avoid latency, if Post detects no more incoming messages, it sends the packet to Kafka immediately without waiting for the threshold to be satisfied.

Set recovery options

If the Kafka process aborts suddenly, or if the machine that it is running on aborts, row changes may be written twice to the Kafka topic. The consumer must manage this by detecting and discarding duplicates.

Every record of every row-change operation in a transaction has the same transaction ID and is also marked with a sequence ID. These attributes are id and msgIdx, respectively, under the txn element in the XML output (see Set up replication from Oracle to Kafka).

The transaction ID is the SCN at the time the transaction was committed, and the sequence ID is the index of the row change in the transaction. These two values are guaranteed to be the same if they are re-written to the Kafka topic in a recovery situation.

If desired, you can configure Post to include additional metadata with every row-change record by using the following command:

target x.kafka [queue queuename] set metadata property[, property]

Table 4: Optional metadata properties

Property	Description
time	The time the operation was applied on the source.
userid	The ID of the database user that performed the operation.
trans	The ID of the transaction that included the operation.
size	The number of operations in the transaction.

Example:

target x.kafka set metadata time, userid, trans, size

To reset the metadata:

target x.kafka [queue queuename] reset metadata

To view the metadata:

target x.kafka [queue queuename] show metadata

Set up replication from Oracle to a SQL or XML file

Overview

SharePlex can post replicated Oracle data to a file formatted as SQL or XML. This data is written as a sequential series of operations as they occurred on the source, which can then be posted in sequential order to a target database or consumed by an external process or program.

For the versions, data types and operations that are supported when using SharePlex to replicate to this target, see the SharePlex Release Notes.

Configure SharePlex on the source

On the source, create a SharePlex configuration file that specifies capture and routing information. The structure that is required in a configuration file varies, depending on your replication strategy, but this shows you the required syntax for routing data to a SQL or XML file.

Datasource:o.SID
src_owner.table	!file[:tgt_owner.table]	host

where:

SID is the Oracle SID of the source Oracle database.
src_owner.table is the owner and name of the source table.
!file is a required keyword that directs Post to write to a file.
tgt_owner.table is optional and specifies the owner and name of the target table. Use if either component is different from that of the source table.
host is the name of the target system.

Note: For more information, see Configure SharePlex to Replicate Data in the SharePlex Administration Guide.

Source configuration example:

The following example replicates the parts table in schema PROD from Oracle instance ora112 to a file on target system sysprod.

Datasource:o.ora112

PROD.parts !file sysprod

Configure SharePlex on the target

By default, SharePlex formats data to a file in XML format, and there is no target setup required unless you want to change properties of the output file (see Set up Replication from Oracle to a SQL or XML File.) To output in SQL format, use the target command to specify the SQL output as follows.

To output data in SQL format:

Start sp_cop.
Start sp_ctrl.
Issue the following required target commands to output the records in SQL.

Note: Use all lower-case characters.

target x.file [queue queuename] set format record=sql

target x.file [queuequeuename] set sql legacy=yes

where: queue queuename constrains the action of the command to the SharePlex Post process that is associated with the specified queue.

See Set up Replication from Oracle to a SQL or XML File for descriptions of these settings and other optional properties that you can set.

To view samples of the SQL and XML formats, see the target command documentation in the SharePlex Reference Guide.

View and change target settings

To view current property settings for output to a file, use the following command:

target x.file show

To change a setting, use the following target command.

target x.file [queue queuename] set [category] property=value

For more information, see the target command in the Target.

File storage and aging

Post writes to a series of files. The active working file is prepended with the label of current_ and is stored in the opx/current subdirectory of the variable-data directory.

Output Format	Name of Current File
SQL	current_legacy.sql
XML	current_prodsys.XML

Important: Do not open or edit the current_ file.

Post uses the max_records, max_size and max_time parameters to determine the point at which to start a new active file. When this switch occurs, Post moves the processed data to a sequenced file in the opx subdirectory of the variable-data directory. The file names include the name of the post queue, the time and date, and an incrementing ID.

SQL files:

/installed/vardir> ls -1 opx

0000000000_20140305130858_legacy.sql

0000000001_20140305131130_legacy.sql

0000000002_20140305131212_legacy.sql

0000000003_20140305133835_legacy.sql

0000000004_20140305134028_legacy.sql

XML files:

/installed/vardir> ls -1 opx

0000000000_20140305130858_prodsys.XML

0000000001_20140305131130_prodsys.XML

0000000002_20140305131212_prodsys.XML

0000000003_20140305133835_prodsys.XML

0000000004_20140305134028_prodsys.XML

To force a file switch:

The current file cannot be viewed or consumed without stopping Post. To access the data in the current file, you can use the target command with the switch option to move the data to a sequenced file, from which it can then be consumed or viewed. After issuing this command, the switch occurs after Post processes a new record.

target x.file [queue queuename] switch

Installation and Setup for Cloud-Hosted Databases for Oracle

Overview of SharePlex Setup on Cloud

Post to PaaS Cloud from the Source Server for Oracle

Post to PaaS Cloud from an Intermediary Server for Oracle

Overview of SharePlex Setup on Cloud

SharePlex supports databases installed as services of Amazon Web Services (AWS) and Microsoft Azure. To view the cloud databases that SharePlex supports, refer to the Supported Cloud Platforms section in the SharePlex Release Notes document for the respective databases.

There are some differences in the way that SharePlex installs in an IaaS cloud environment, a PaaS cloud environment, and a SaaS cloud environment. These differences are only in the installation and configuration of SharePlex. Once installed and configured, SharePlex operates in the cloud the same way that it operates in on-premise installations.

Installation in an IaaS (accessible) environment

If your cloud database service is a true IaaS virtual computing environment, you can install and run a custom application environment, access the operating system, and manage access permissions and storage. In this environment, SharePlex is installed directly on the cloud server just as you would install it locally, without any special setup requirements.

In this environment, the following applies:

SharePlex can capture from an Oracle source database in an IaaS cloud.
SharePlex can Post to any supported target database in an IaaS cloud.
You can proceed to the standard installation instructions in this manual.

Install SharePlex on Linux/Unix for Oracle Database

Install SharePlex on Linux/Unix for Open Target Databases

Installation in a PaaS (non-accessible) environment

If your cloud database is installed in a true PaaS environment, you do not have access to the underlying operating system, and you must install SharePlex on a server that is external to the cloud deployment. You then configure SharePlex to interact with the target database through a remote connection.

SharePlex can capture data from supported sources and post it to databases in PaaS environment using remote capture and remote post capabilities.

You can install SharePlex for a PaaS source and target in one of the following ways:

With remote post, you can use your on-premise production source server to run all of the SharePlex replication components. In this setup, both source and target replication processes (and their queues) are installed on one server. The SharePlex Post process connects through a remote connection to the target cloud database.

For more information, see Post to PaaS cloud from the source system.

NOTE: In a high-volume transactional environment, the buildup of data in the post queues and the presence of multiple Post processes may generate unacceptable overhead for a production system. In that case, you should use an intermediary server.
You can use an on-premise intermediary server to run the Import and Post components (and the post queues). Post connects to the cloud target through a remote connection. This method removes most of the replication overhead from the source server. For more information, see Post to PaaS cloud from an intermediary server.

Installation in a SaaS (non-accessible) environment

If your cloud database is hosted in a true SaaS environment, you do not have access to the underlying operating system, and you must install SharePlex on a server that is external to the cloud deployment. You then configure SharePlex to interact with the source and target databases through a remote connection.

You can install SharePlex for a SaaS target in one of the following ways:

With remote capture, you can utilize an on-premise or cloud VM intermediary server to install and configure SharePlex for running the Capture and Export processes. Capture establishes a remote connection to the source SaaS database, while Export communicates with Import and Post on the target system.
With remote Capture and remote Post, you can employ an on-premise or cloud VM intermediary server to install and configure SharePlex. Both Capture and Post processes will run on the same server. Capture establishes a remote connection to the source SaaS database, and Post establishes a remote connection to the target SaaS database.

Install SharePlex on Linux/Unix for PostgreSQL Database as a Service.

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

SharePlex 11.4 - Installation and Setup guide

Set up replication from Oracle to Kafka

Overview

Guidelines for posting to Kafka

Configure SharePlex on the source

Configure SharePlex on the target

Set the Kafka record format

View and change Kafka settings

Set recovery options

Set up replication from Oracle to a SQL or XML file

Overview

Configure SharePlex on the source

Source configuration example:

Configure SharePlex on the target

View and change target settings

File storage and aging

Installation and Setup for Cloud-Hosted Databases for Oracle

Contents

Overview of SharePlex Setup on Cloud

Installation in an IaaS (accessible) environment

Installation in a PaaS (non-accessible) environment

Installation in a SaaS (non-accessible) environment