Chat now with support
Chat with Support

SharePlex Connector for Hadoop 1.5 - Installation and Setup Guide

Command reference

conn_setup.sh

Use this script to setup SharePlex Connector for Hadoop and provide the necessary configuration details. Setup is usually a one-time activity. This script is similar to ora_setup of SharePlex for Oracle.

Shell Script Usage

[user@host bin]$ ./conn_setup.sh [-c <INPUT_FILE>] [-r] [-n][-d][--help] [--version]

Options

Parameter

Description

-c <INPUT_FILE>

Provide an input file that defines the configuration parameters with their values.

Each configuration parameter should be specified on a new line of the input file.

Format - Key=Value. Example – enableRealTimeCopy=True

You may like to use this parameter to modify your original configuration. When conn_setup.sh is run without parameters the file it creates is conf/connectorConfiguration.xml

There is an example input file located at conf/conn_setup_template.properties

-r

Use to update the configuration

  • Add/Update parameters for HBase Real Time Replication: the HBase column family name.
  • Tables will now replicate to HBase Real Time Replication unless otherwise specified in conn_snapshot.sh

-n

Use to update the configuration

  • Add/Update parameters for HDFS Near Real Time Replication: add the HDFS destination directory and how often do you want to copy data to HDFS? This is measured by time and number of changes.
  • Tables will now replicate to HDFS Near Real Time Replication unless otherwise specified in conn_snapshot.sh
-d

Use to update the configuration

  • Add/Update parameters for Change Data Capture (CDC Feature): add the CDC destination directory.

--help

Show this help and exit.

--version

Show version information and exit.

Note: The user must restart the connector after enabling any of the features using –r/-n/-d option.

Configuration Parameters

If conn_setup.sh is run without an input file then you will be prompted to supply a value to each of the configuration parameters. For more information, see Run conn_setup.sh.

Default values are provided within brackets. Press Enter to select the default value.

conn_snapshot.sh

conn_snapshot.sh

Use this script to take a fresh copy of an Oracle table for replication.

Shell Script Usage

conn_snapshot.sh -t <TABLE_OWNER.TABLE_NAME> [-f <FILE_TYPE>] [-s <FIELD_SEPARATOR>] [-e <CREATE_EXTERNAL_TABLE>] [-h <HIVE_HOME_DIR>] [-m <NUM_OF_MAPPERS>] [-n <CHANGES_THRESHOLD>] [-i <TIME_THRESHOLD>] [-r] [-d] [-v] [--help] [--version]

Options

Parameter

Description

-t <TABLE_OWNER.TABLE_NAME>

Name and owner of the table to import from Oracle. Required.

-f <FILE_TYPE>

Applicable to HDFS Near Real Time Replication.

File type for import. [Text|Sequence] (Default = Text. Use -f Sequence for sequence file type.)

-f Sequence

All data is replicated in the Sequence files on the HDFS. To read/write to the Sequence files you need access to the Writable classes used to generate the Sequence files. See the SharePlex Connector for Hadoop lib/sqoop-records directory and SharePlex Connector for Hadoop lib/sqoop-records.jar.

SharePlex Connector for Hadoop does not copy to Hive over HDFS for Sequence file type.

-s <FIELD_SEPARATOR>

The separator between each field/column. The separator must be enclosed in single quotes.

-e <CREATE_EXTERNAL_TABLE>

Copy to Hive. [true|false] (Default = false. Use -e true to enable.)

If true then

  • Copy to Hive over HDFS if a copy is taken for HDFS Near Real Time Replication.
  • Copy to Hive over HBase if a copy is taken for HBase Real Time Replication.

-h <HIVE_HOME_DIR>

Path to the Hive home directory.

If not specified the value of the HIVE_HOME environment variable is used. If this option is not set and the HIVE_HOME environment variable is also not set, this parameter will be set as relative to HADOOP_HOME.

-m <NUM_OF_MAPPERS>

The number of mappers to be used.

-n <CHANGES_THRESHOLD>

Use to override the default setting for how often SharePlex Connector for Hadoop replicates the table (measured by the number of changes to the table).

  • Applicable to HDFS Near Real Time Replication.
  • The default setting was set in conn_setup.sh.
  • Replication is executed on the first condition met: on the given number of changes to the table or the set time period -i, whichever comes first.
  • SharePlex Connector for Hadoop remembers this setting and makes use of it during further execution of conn_snapshot.

-i <TIME_THRESHOLD>

Use to override the default setting for how often SharePlex Connector for Hadoop replicates the table (measured by the number of minutes).

  • Applicable to HDFS Near Real Time Replication.
  • The default setting was set in conn_setup.sh.
  • Replication is executed on the first condition met: on the given number of changes to the table -n or the set time period, whichever comes first.
  • SharePlex Connector for Hadoop remembers this setting and makes use of it during further execution of conn_snapshot.

-r -d

Use to override the settings in conn_setup.sh. If not specified, replicate this Oracle table using HBASE and/or HDFS as per the settings in conn_setup.sh.

-r

A copy of the table is taken for HBase Real Time Replication.

Do not replicate this Oracle table using HDFS.

This overrides the settings in conn_setup.sh.

-d

A copy of the table is taken for HDFS Near Real Time Replication.

Do not replicate this Oracle table using HBASE.

This overrides the settings in conn_setup.sh.

-r -d

A copy of the table is taken for HBase Real Time Replication and HDFS Near Real Time Replication.

This overrides the settings in conn_setup.sh.

SharePlex Connector for Hadoop remembers these settings and makes use of them during further execution of conn_snapshot.

-v

Verbose - Show detailed information of each step.

--help

Show this help and exit.

--version

Show version information and exit.

Example

[user@host bin]$ ./conn_snapshot.sh -t Schema.Table -s ';'

Use Cases

 

Take a copy of the Oracle table Shema.Table for replication over HDFS and / or HBase as per the settings in conn_setup.sh

conn_snapshot.sh -t Schema.Table -s ';'

As above and ...

  • Copy to Hive over HDFS if a copy is taken for HDFS Near Real Time Replication.
  • Copy to Hive over HBase if a copy is taken for HBase Real Time Replication.

conn_snapshot.sh -t Schema.Table -s ';' -e true

Take a copy of the Oracle table Shema.Table for replication over HBase. Do not replicate over HDFS.

conn_snapshot.sh -t Schema.Table -s ';' -r

Take a copy of the Oracle table Shema.Table for replication over HDFS. Do not replicate over HBase. Replication is set for every 20 minutes or 100 changes - whichever comes first.

conn_snapshot.sh -t Schema.Table -s ';' –d –i 20 –n 100

conn_cdc.sh

Use this script to start capturing the change data for insert, update, delete and before update operations on HDFS for a specified table name.

Note: For conn_cdc script to run, change data capture feature must be enabled in setup script. Capturing change data for a table will start only when user executes conn_cdc script for that table.

Shell Script Usage

conn_cdc.sh -t <TABLE_OWNER.TABLE_NAME> [-i] [-u] [-d] [-b] [--help] [--version]

Options

Parameter

Description

-t <TABLE_OWNER.TABLE_NAME>

Name and owner of the table to import from Oracle. Required.

-i If specified, enables change data capture for insert operation.
-u If specified, enables change data capture for update operation.
-d If specified, enables change data capture for delete operation.
-b If specified, enables change data capture for before update operation values.

--help

Show this help and exit.

--version

Show version information and exit.

Example

[user@host bin]$ ./conn_cdc.sh -t Schema.Table

Use Cases

 
If none of these parameters [-i] [-u] [-d] [-b] are provided, then all captured changes (insert, update and delete) will be recorded by default. Before updates will not be recorded. conn_cdc.sh -t Schema.Table
Capture change data for only insert operation. Rest of the operations will not be recorded. conn_cdc.sh -t Schema.Table -i
Capture change data for only update and delete operation. Insert and before update operation will not be recorded. conn_cdc.sh -t Schema.Table –u -d
Capture change data for update and before update operation. Rest of the operations will not be recorded. conn_cdc.sh -t Schema.Table –u -b

Note: Execute conn_cdc script again for the same table if you want to change the preference of capturing different operations.

 

Accessing captured change data (CDC) data written on HDFS

Change data written on HDFS for a particular table can be accessed using two ways as mentioned below.

  1. Hive external tables pointing to CDC data on HDFS.

    SharePlex Connector for Hadoop creates external tables in Hive for CDC feature. External Hive table(s) gets created under Hive database same as Oracle table owner / schema. Each external Hive table points to data stored for that table on CDC HDFS destination directory.

    SharePlex Connector for Hadoop does the mapping of Oracle data types to appropriate data types in external Hive table as shown below. This allows user to make use of built-in Hive UDFs available with the specific Hive data types.

    Oracle data type Hive data type
    CHAR STRING
    VARCHAR2 STRING
    NCHAR STRING
    NVARCHAR2 STRING
    NUMBER DOUBLE
    FLOAT DOUBLE
    DATE TIMESTAMP
    TIMESTAMP TIMESTAMP
    TIMESTAMP WITH TIME ZONE STRING
    TIMESTAMP WITH LOCAL TIME ZONE STRING
    INTERVAL YEAR TO MONTH STRING
    INTERVAL DAY TO SECOND STRING
    ROWID STRING

    Note: If any of Oracle table’s column data type is modified, Hive data type for that particular column is mapped to STRING.

  2. Custom Map-Reduce programs

    SharePlex Connector for Hadoop generates change-data-records (Java utility classes) per CDC table using which custom Map-Reduce programs can read CDC data stored on HDFS. Change a data record (utility classes) gets generated under “lib” directory as well as these classes gets added in lib\change-data-records.jar file.

conn_ctrl.sh

conn_ctrl.sh

Use this script to start / restart / shutdown SharePlex Connector for Hadoop and reload SharePlex Connector for Hadoop configurations.

Shell Script Usage

conn_ctrl.sh [start|restart|shutdown|reload-conf|--help|--version|--status]

Options

Parameter

Description

start

Start SharePlex Connector for Hadoop.

restart

Stop and start SharePlex Connector for Hadoop.

shutdown

Stop SharePlex Connector for Hadoop.

reload-conf

Reload some of the SharePlex Connector for Hadoop configurations without restarting SharePlex Connector for Hadoop. These configurations include

  • HDFS Near Real Time Replication changes and time interval.

--help

Show this help and exit.

--version

Show version information and exit.

--status Show the status (started / shutdown) of SharePlex Connector for Hadoop.

Example

[user@host bin]$ ./conn_ctrl.sh

Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating