SharePlex Connector for Hadoop 1.5 - Installation and Setup Guide

Start ActiveMQ Start the SharePlex cop Process Configure ActiveMQ to work with SharePlex Set the total number of operations in the transaction Tuning JMS to improve performance of SharePlex Connector for Hadoop Additional - To replicate tables in Hive over HBase

SharePlex Connector for Hadoop

Run install.sh Run conn_setup.sh

Use cases

Replication Paused or Data Inconsistent (Out of Sync) Edit the list of tables being replicated Apache Derby Usage

Command reference

conn_snapshot.sh conn_cdc.sh conn_ctrl.sh conn_monitor.sh conn_cleanup.sh uninstall.sh

SharePlex Log Files Verifying the JMS Queue Oracle Data Source

Additional - To replicate tables in Hive over HBase

If you intend to replicate tables in Hive over HBase then complete the following additional setup steps.

Ensure the Zookeeper Quorum and client port are configured in $HIVE_HOME/conf/hive-site.xml.

<name>hbase.zookeeper.quorum</name>

<value> ---- PLEASE SPECIFY ---- </value>

<description>A comma separated list (with no spaces) of the IP addresses of all ZooKeeper servers in the cluster.</description>

</property>

<name>hbase.zookeeper.property.clientPort</name>

<value> ---- PLEASE SPECIFY ---- </value>

<description>The Zookeeper client port. Default clientPort is 2181.</description>

</property>

Use --auxpath while entering into Hive

hive --auxpath <HIVE_HOME>/lib/hive-hbase-handler-<version>.jar,<HBASE_ HOME>/hbase.jar,<HBASE_HOME>/lib/zookeeper.jar,<HBASE_HOME>/lib/guava-<version>.jar

When using CDH4.2 or CDH 5.0.0 with Kerberos authentication

Export the HIVE_OPTS environment variable with Kerberos parameters as shown below to replicate Hive over HBase.

export HIVE_OPTS="-hiveconf hbase.security.authentication=kerberos -hiveconf hbase.rpc.engine=org.apache.hadoop.hbase.ipc.SecureRpcEngine -hiveconf hbase.master.kerberos.principal=hbase/_HOST@KerberosRelm -hiveconf hbase.regionserver.kerberos.principal=hbase/_HOST@KerberosRelm -hiveconf hbase.zookeeper.quorum=zookeeperQuorum"

TIP: The Kerberos utility kinit allows you to identify yourself to the Kerberos server. “kinit” needs to be invoked (only once) if you are starting a new session.

SharePlex Connector for Hadoop

Unpack the Archive

SharePlex Connector for Hadoop is distributed in the archive: shareplex-hadoop-connector-version-hadoopDistributionVersion.tar.gz where version identifies the SharePlex Connector for Hadoop release.

Extract the archive with hadoopDistributionVersion appropriate to your Hadoop installation on a machine where Hadoop libraries and configurations are present. Use the command:

$tar -xzf shareplex-hadoop-connector-version-hadoopDistributionVersion.tar.gz

The archive contains the following files:

File	Description
install.sh	A shell script that installs/upgrades SharePlex™ Connector for Hadoop® and other programs in the archive. Installing a SharePlex Connector for Hadoop upgrade or reinstalling the software without un-installing it preserves existing data files. Before you upgrade or re-install SharePlex Connector for Hadoop, see the Release Notes for the version you are installing to familiarize yourself with any special upgrade or installation requirements.
shareplex-hadoop-connector-hadoopDistributionVersion.tar	SharePlex™ Connector for Hadoop® archive.
db-derby-10.9.1.0-bin.tar.gz	Apache Derby installable. SharePlex™ Connector for Hadoop® uses the Apache Derby network server and creates a database for storing metadata and status information.
sqoop-quest-1.4.3.bin__hadoop-version.tar.gz	Apache Sqoop installable. Sqoop is a tool designed to transfer bulk data between Apache Hadoop and structured data stores.
quest-oraoop-1.6.0-date-version.tar.gz	Data Connector for Oracle and Hadoop archive. Data Connector for Oracle and Hadoop is an optional plugin to Sqoop. It facilitates the movement of data between Oracle and Hadoop.

Run install.sh

This shell script installs/upgrades programs in the SharePlex Connector for Hadoop archive.

Important — upgrades only! You do not need to uninstall SharePlex Connector for Hadoop before upgrading. Install the upgrade over the existing version.

Shell Script Usage

[user@host bin]$ ./install.sh [-h <HADOOP_HOME_DIR>] [-c <HADOOP_CONF_DIR>] [-b <HBASE_HOME_DIR>] [-v <HIVE_HOME_DIR>] [--help] [--version]

Options

Parameter	Description
-h <HADOOP_HOME_DIR>	The path to the Hadoop home directory. This option overrides HADOOP_HOME in the environment. If this option is not set and the HADOOP_HOME environment variable is also not set, this parameter is set to /usr/lib/hadoop as default.
-c <HADOOP_CONF_DIR>	The path to the Hadoop conf directory. This option overrides HADOOP_CONF_DIR in the environment. If this option is not set and the HADOOP_CONF_DIR environment variable is also not set, this parameter is set to HADOOP_HOME/conf folder (for HDP / IDH / Apache) HADOOP_HOME/etc/hadoop (for CDH4 or CDH 5.0.0).
-b <HBASE_HOME_DIR>	The path to HBase home directory. This option overrides HBASE_HOME in the environment. If this option is not set and the HBASE_HOME environment variable is also not set, this parameter is set relative to HADOOP_HOME.
-v <HIVE_HOME_DIR>	The path to Hive home directory. This option overrides HIVE_HOME in the environment.
--help	Show this help and exit.
--version	Show version information and exit.

Note: Optional parameters like –h / -c / -b / -v are applicable during fresh installation only. For Upgrade, install script will refer environment variables from bin/shareplex_hadoop_env.sh.

When the shell script has finished executing, install.sh starts the Apache Derby network server.

About the new shareplex_hadoop_connector directory

Files and Directories	Description
bin	SharePlex Connector for Hadoop shell scripts as documented in this guide. In addition shareplex_hadoop_env.sh is used by SharePlex Connector for Hadoop. You can set the environment variables by executing source shareplex_hadoop_env.sh
conf	SharePlex Connector for Hadoop configuration files.
db-derby-version-bin	Apache Derby application.
lib	SharePlex Connector for Hadoop required dependencies.
logs	SharePlex Connector for Hadoop log files.
shareplex_hadoop_connector.jar	The Java archive file containing SharePlex Connector for Hadoop application code.
oraoop-version	Data Connector for Oracle and Hadoop application
sqoop-quest-version.bin	Apache Sqoop application

Run conn_setup.sh

Run this script to setup SharePlex Connector for Hadoop and provide the necessary configuration details. This is usually a one-time activity.

Shell Script Usage

[user@host bin]$ ./conn_setup.sh

TIP: See Command reference for a complete description of this command.

Configuration Parameters

The script will prompt you to respond to all configuration parameters, one-by-one.

Note: Default values are provided within brackets. Press Enter to select the default value.

Categories of detail	You will be prompted to provide the following details.
HDFS	Do you want to enable Hadoop connector to copy data to HDFS? Answer yes to this question if you intend to replicate all (or most) tables by HDFS Near Real Time Replication.
HBase	Do you want to enable Hadoop connector to copy data to HBase? Answer yes to this question if HBase is setup in your environment and you intend to replicate all (or most) tables by HBase Real Time Replication.
CDC	Do you intend to capture change data? Answer yes to this question to enable Change Data Capture. Please make sure you are using SharePlex for Oracle minimum version 8.5 if you intend to enable the Change Data Capture feature.
HBase parameters	You will be prompted for this detail if you have responded YES to replicate tables to HBase. The HBase column family name
HDFS parameters	You will be prompted for this detail if you have responded YES to replicate tables to HDFS. The HDFS destination directory. You should consider this directory as used exclusively by SharePlex Connector for Hadoop. This directory may be cleaned up by conn_cleanup.sh and uninstall.sh. Note that HDFS destination directory entered by the user will be appended with “/hdfs_replication” and the new value for HDFS destination directory will be displayed on console. How often do you want to copy data to HDFS? This is measured by time and number of changes. The first question relates to time (in minutes). If you say 10 minutes for example then the table will be replicated every 10 minutes. You should not set this to under 10 minutes. The second question relates to the number of changes. If you say 2 then replication is executed following 2 changes to the table. Replication is executed on the first condition met: on the given number of changes to the table or the set time period, whichever comes first.
CDC Parameters	You will be prompted for this detail if you have responded YES to capture change data. The CDC destination directory. You should consider this directory as used exclusively by SharePlex Connector for Hadoop. Note that CDC destination directory entered by the user will be appended with “/change_data_capture” and new value for CDC destination directory will be displayed on console. SharePlex Connector for Hadoop maintains internal changes threshold count of 1000 and internal time threshold of 15 seconds (by default). So after every 1000 changes to the table or after every 15 seconds, change data will be written on specified CDC destination directory on HDFS.
JMS parameters	The name of the JMS queue. By default: OpenTarget The name of the host running ActiveMQ. For example: localhost The port number used by the JNDI provider.url property. By default: 61616 The port number used to access the ActiveMQ admin web site. By default: 8161 For more information on each of these properties see Configure ActiveMQ to work with SharePlex
Oracle parameters	Host name (or TCP/IP address of the Oracle server) The port to connect to the Oracle server, By default: 1521 Oracle instance (SID). By default: ORCL Oracle username You will be prompted to enter the Oracle password when taking a snapshot. Refer to the use cases for more information.

Configuration Complete

SharePlex Connector for Hadoop shows the following messages indicating that it is ready for use.

connectorConfiguration.Xml updated successfully.
JMSConfiguration.xml updated successfully.
OraOopConfiguration.xml updated successfully.
Connector setup completed successfully.

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

SharePlex Connector for Hadoop 1.5 - Installation and Setup Guide

Additional - To replicate tables in Hive over HBase

Additional - To replicate tables in Hive over HBase

SharePlex Connector for Hadoop

Unpack the Archive

Run install.sh

Run install.sh

About the new shareplex_hadoop_connector directory

Run conn_setup.sh

Run conn_setup.sh