SharePlex Connector for Hadoop is distributed in the archive: shareplex-hadoop-connector-version-hadoopDistributionVersion.tar.gz where version identifies the SharePlex Connector for Hadoop release.
Note: The word "beta" is appended to the end of the name if this is a beta version.
Extract the archive with hadoopDistributionVersion appropriate to your Hadoop installation on a machine where Hadoop libraries and configurations are present. Use the command:
$tar -xzf shareplex-hadoop-connector-version-hadoopDistributionVersion.tar.gz
The archive contains the following files:
File | Description |
---|---|
install.sh |
A shell script that installs/upgrades SharePlex™ Connector for Hadoop® and other programs in the archive. Installing a SharePlex Connector for Hadoop upgrade or reinstalling the software without un-installing it preserves existing data files. Before you upgrade or re-install SharePlex Connector for Hadoop, see the Release Notes for the version you are installing to familiarize yourself with any special upgrade or installation requirements. |
shareplex-hadoop-connector-version-hadoopDistributionVersion.tar | SharePlex™ Connector for Hadoop® archive. |
db-derby-10.9.1.0-bin.tar.gz |
Apache Derby installable. SharePlex™ Connector for Hadoop® uses the Apache Derby network server and creates a database for storing metadata and status information. |
sqoop- version.bin__hadoop-version.tar.gz |
Apache Sqoop installable. Sqoop is a tool designed to transfer bulk data between Apache Hadoop and structured data stores. |
quest-oraoop-1.6.0-date-version.tar.gz |
Data Connector for Oracle and Hadoop archive. Data Connector for Oracle and Hadoop is an optional plugin to Sqoop. It facilitates the movement of data between Oracle and Hadoop. |
This shell script installs/upgrades programs in the SharePlex Connector for Hadoop archive.
Important — upgrades only! You do not need to uninstall SharePlex Connector for Hadoop before upgrading. Install the upgrade over the existing version.
Shell Script Usage
[user@host bin]$ ./install.sh [-h <HADOOP_HOME_DIR>] [-c <HADOOP_CONF_DIR>] [-b <HBASE_HOME_DIR>] [-v <HIVE_HOME_DIR>] [--help] [--version]
Options
Parameter |
Description |
---|---|
-h <HADOOP_HOME_DIR> |
The path to the Hadoop home directory. This option overrides HADOOP_HOME in the environment. If this option is not set and the HADOOP_HOME environment variable is also not set, this parameter is set to /usr/lib/hadoop as default. |
-c <HADOOP_CONF_DIR> |
The path to the Hadoop conf directory. This option overrides HADOOP_CONF_DIR in the environment. If this option is not set and the HADOOP_CONF_DIR environment variable is also not set, this parameter is set to
|
-b <HBASE_HOME_DIR> |
The path to HBase home directory. This option overrides HBASE_HOME in the environment. If this option is not set and the HBASE_HOME environment variable is also not set, this parameter is set relative to HADOOP_HOME. |
-v <HIVE_HOME_DIR> |
The path to Hive home directory. This option overrides HIVE_HOME in the environment. |
--help |
Show this help and exit. |
--version |
Show version information and exit. |
Note: Optional parameters like –h / -c / -b / -v are applicable during fresh installation only. For Upgrade, install script will refer environment variables from bin/shareplex_hadoop_env.sh.
When the shell script has finished executing, install.sh starts the Apache Derby network server.
Run this script to setup SharePlex Connector for Hadoop and provide the necessary configuration details. This is usually a one-time activity.
Shell Script Usage
[user@host bin]$ ./conn_setup.sh
TIP: See conn_setup.sh for a complete description of this command.
Configuration Parameters
The script will prompt you to respond to all configuration parameters, one-by-one.
Note: Default values are provided within brackets. Press Enter to select the default value.
Categories of detail |
You will be prompted to provide the following details. |
---|---|
Do you want to enable Hadoop connector to copy data to HDFS? Answer yes to this question if you intend to replicate all (or most) tables by HDFS Near Real Time Replication. | |
Do you want to enable Hadoop connector to copy data to HBase? Answer yes to this question if HBase is setup in your environment and you intend to replicate all (or most) tables by HBase Real Time Replication. | |
CDC |
Do you intend to capture change data? Answer yes to this question to enable the SharePlex Connector for Hadoop change history feature (formerly known as Change Data Capture) that maintains row-based change history of every change made to the source database. Please make sure you are using SharePlex for Oracle minimum version 8.5 if you intend to enable the change history feature. |
HBase parameters |
You will be prompted for this detail if you have responded YES to replicate tables to HBase.
|
HDFS parameters |
You will be prompted for this detail if you have responded YES to replicate tables to HDFS.
The first question relates to time (in minutes). If you say 10 minutes for example then the table will be replicated every 10 minutes. You should not set this to under 10 minutes. The second question relates to the number of changes. If you say 2 then replication is executed following 2 changes to the table. Replication is executed on the first condition met: on the given number of changes to the table or the set time period, whichever comes first.
|
CDC Parameters |
You will be prompted for this detail if you have responded YES to capture change data (change history).
SharePlex Connector for Hadoop maintains internal changes threshold count of 1000 and internal time threshold of 15 seconds (by default). So after every 1000 changes to the table or after every 15 seconds, change data will be written on specified destination directory on HDFS. |
For more information on each of these properties see Configure ActiveMQ to work with SharePlex | |
Oracle parameters |
You will be prompted to enter the Oracle password when taking a snapshot. Refer to the use cases for more information. |
Optional Configuration Parameter
The connector setup script will not prompt for the following optional configuration parameter.
derbyUpdationInterval parameter: This parameter is in the CONNECTOR_HOME/conf/connectorConfigurations.xml file.
This parameter controls how often SharePlex Connector for Hadoop validates the Derby connection and re-establishes it (if needed) before performing any Derby related operations. This check ensures connection integrity in case the connection with Derby goes down. These Derby checks will be performed periodically after the specified time interval. By default this parameter is set to 1 minute.
Configuration Complete
SharePlex Connector for Hadoop shows the following messages indicating that it is ready for use.
connectorConfiguration.Xml updated successfully.
JMSConfiguration.xml updated successfully.
OraOopConfiguration.xml updated successfully.
Connector setup completed successfully.
© 2024 Quest Software Inc. ALL RIGHTS RESERVED. 이용 약관 개인정보 보호정책 Cookie Preference Center