Chat now with support
Chat mit Support

SharePlex Connector for Hadoop 1.0 - Installation and Setup Guide

Run conn_setup.sh

Run conn_setup.sh

Run this script to setup SharePlex Connector for Hadoop and provide the necessary configuration details. This is usually a one time activity.

Shell Script Usage

[user@host bin]$ ./conn_setup.sh

Tip: See conn_setup.sh (page 1) for a complete description of this command.

Configuration Parameters

The script will prompt you to respond to all configuration parameters, one-by-one.

NOTE: Default values are provided within brackets. Press Enter to select the default value.

Categories of detail

You will be prompted to provide the following details...

HDFS

Do you want to enable Hadoop connector to copy data to HDFS?

Answer yes to this question if you intend to replicate all (or most) tables by HDFS Near Real Time Replication.

HBase

Do you want to enable Hadoop connector to copy data to HBase?

Answer yes to this question if HBase is setup in your environment and you intend to replicate all (or most) tables by HBase Real Time Replication.

HBase parameters

You will be prompted for this detail if you have responded YES to replicate tables to HBase.

  • The HBase column family name
HDFS parameters

You will be prompted for this detail if you have responded YES to replicate tables to HDFS.

  • The HDFS destination directory. You should consider this directory as used exclusively by SharePlex Connector for Hadoop. This directory may be cleaned up by conn_cleanup.sh and uninstall.sh.

How often do you want to copy data to HDFS? This is measured by time and number of changes.

The first question relates to time (in minutes). If you say 10 minutes for example then the table will be replicated every 10 minutes. You should not set this to under 10 minutes.

The second question relates to the number of changes. If you say 2 then replication is executed following 2 changes to the table.

Replication is executed on the first condition met: on the given number of changes to the table or the set time period, whichever comes first.

JMS parameters

  • The name of the JMS queue. For example: OpenTarget
  • The name of the host running ActiveMQ. For example: localhost
  • The port number used by the JNDI provider.url property. For example: 61616
  • The port number used to access the ActiveMQ admin web site. By default: 8161

Tip: For more information on each of these properties see Configure ActiveMQ to work with SharePlex (page 1)

Oracle parameters
  • Host name (or TCP/IP address of the Oracle server)
  • The port to connect to the Oracle server, By default: 1521
  • Oracle instance (SID)
  • Oracle username

NOTE: You will be prompted to enter the Oracle password when taking a snapshot. Refer to the use cases for more information.

Configuration Complete

SharePlex Connector for Hadoop shows this message. It is ready for use.

connectorConfiguration.Xml updated successfully.
JMSConfiguration.xml updated successfully.
OraOopConfiguration.xml updated successfully.
Connector setup completed successfully.

Use Cases

Setup and Start Replication

NOTE: Ensure you complete all Initial Setup instructions first.

1. Start SharePlex for Oracle and sp_ctrl

Ensure SharePlex for Oracle and sp_ctrl are running. The prompt should be sp_ctrl (host:port)>. Refer to the SharePlex for Oracle documentation for more information.

/u01/app/shareplex/prod/bin > ./sp_ctrl

sp_ctrl ()>

2. Define the Oracle tables to replicate

Use the SharePlex for Oracle create config command to create the file ConfigFile. The file is opened in vi. Declare all the Oracle tables you want captured into Hadoop, one table per line.

sp_ctrl ()> create config ConfigFile

#######################

datasource: O.OracleSID

OracleSchema.OracleTable !jms[:TargetSchema.TargetTable] IPHostPostQueue[:PostQueueName]

#####################

Example line: soo70.G_AUTHORS !jms 10.20.26.28:q2

IPHostPostQueue is the name or IP address of the host on which the SharePlex post queue is running.

For more information on PostQueueName see Configure ActiveMQ to work with SharePlex (page 1).

Tip: To verify there are no errors in the config file run command sp_ctrl ()> verify config ConfigFile

3. Stop the post queue

SharePlex for Oracle uses the post queue to send messages to the JMS queue.

sp_ctrl ()> stop post queue PostQueueName

4. Run activate config

Use the SharePlex for Oracle activate config command to activate the file ConfigFile.

sp_ctrl ()> activate config ConfigFile

Tip: If you see the error "minimal supplemental logging should be enabled" then See "Oracle Data Source" (page 1) for more information.

5. SharePlex Connector for Hadoop - Run conn_snapshot.sh

Execute the SharePlex Connector for Hadoop conn_snapshot.sh script for each Oracle table in the ConfigFile. This makes a copy of each Oracle table to replicate.

NOTE: Take a snapshot of all the tables defined in the config file before you start the post queue. Once you start the post queue, SharePlex Connector for Hadoop will ignore messages for those tables that have not had a snapshot taken.

The conn_snapshot.sh script is fully customizable. It is fully documented in conn_snapshot.sh (page 1).

» conn_snapshot.sh -t Schema.Table -s ';'

NOTE: You will be prompted to enter the Oracle password. This is the password to the Oracle username supplied during configuration. See "Run conn_setup.sh" (page 1) for more information.

6. SharePlex for Oracle - Start the post queue

Start the post queue so SharePlex for Oracle can send messages from the post queue to the JMS queue.

» sp_ctrl ()> start post queue PostQueueName

7. Start SharePlex Connector for Hadoop

Return to SharePlex Connector for Hadoop. For more on the conn_ctrl.sh command see conn_ctrl.sh (page 1).

» conn_ctrl.sh start

Replication Paused or Data Inconsistent (Out of Sync)

SharePlex Connector for Hadoop compares the values stored in HBase / HDFS with the lookup values received from SharePlex. If they don't match then data inconsistency is reported on the console and in the shareplex-connector-alert.log.

REPLICATION PAUSED

Scenarios that may lead to data inconsistency where SharePlex Connector for Hadoop pauses replication of a table include:

  • For HDFS Near Real Time replication the entire merging job at Hadoop fails. Replication of the table is paused as there will be inconsistencies going ahead.
  • Following changes to the schema (Alter).

Data Inconsistent (Out of Sync) - Take a fresh snapshot

Scenarios that may lead to data inconsistency include: receiving a delete for a row which does not exist, receiving an update on a deleted row, receiving an insert on an already inserted row.

SharePlex Connector for Hadoop suggests you take a new snapshot of the table

Follow these instructions.

1. SharePlex for Oracle - Stop the post queue

SharePlex for Oracle uses the post queue to send messages to the JMS queue. Stop the post queue before you take a snapshot.

Tip: Enter command /u01/app/shareplex/prod/bin > ./sp_ctrl to open the sp_ctrl ()> prompt.

» sp_ctrl ()> stop post queue PostQueueName

For more information on PostQueueName see Configure ActiveMQ to work with SharePlex (page 1).

2. SharePlex Connector for Hadoop - Run conn_snapshot.sh

Execute the SharePlex Connector for Hadoop conn_snapshot.sh script. In SharePlex Connector for Hadoop, run the conn_snapshot.sh script for the Oracle table associated with the REPLICATION PAUSED or DATA_INCONSISTENT message. This makes a fresh copy of that Oracle table.

This script is fully documented in conn_snapshot.sh (page 1).

» conn_snapshot.sh -t Schema.Table -s ';'

NOTE: You will be prompted to enter the Oracle password. This is the password to the Oracle username supplied during configuration. See "Run conn_setup.sh" (page 1) for more information.

3. SharePlex for Oracle - Start the post queue

Return to SharePlex for Oracle. Start the post queue so SharePlex for Oracle can send messages from the post queue to the JMS queue.

» sp_ctrl ()> start post queue PostQueueName
Verwandte Dokumente

The document was helpful.

Bewertung auswählen

I easily found the information I needed.

Bewertung auswählen