Use this script to setup SharePlex Connector for Hadoop and provide the necessary configuration details. Setup is usually a one-time activity. This script is similar to ora_setup of SharePlex for Oracle.
Shell Script Usage
[user@host bin]$ ./conn_setup.sh [-c <INPUT_FILE>] [-r] [-n][-d][--help] [--version]
Options
| Parameter |
Description |
|---|---|
|
-c <INPUT_FILE> |
Provide an input file that defines the configuration parameters with their values. Each configuration parameter should be specified on a new line of the input file. Format - Key=Value. Example – enableRealTimeCopy=True You may like to use this parameter to modify your original configuration. When conn_setup.sh is run without parameters the file it creates is conf/connectorConfiguration.xml There is an example input file located at conf/conn_setup_template.properties |
|
-r |
Use to update the configuration
|
|
-n |
Use to update the configuration
|
| -d |
Use to update the configuration
|
|
--help |
Show this help and exit. |
|
--version |
Show version information and exit. |
Note: The user must restart the connector after enabling any of the features using –r/-n/-d option.
Configuration Parameters
If conn_setup.sh is run without an input file then you will be prompted to supply a value to each of the configuration parameters. For more information, see Run conn_setup.sh.
Default values are provided within brackets. Press Enter to select the default value.
Use this script to take a fresh copy of an Oracle table for replication.
Shell Script Usage
conn_snapshot.sh -t <TABLE_OWNER.TABLE_NAME> [-f <FILE_TYPE>] [-s <FIELD_SEPARATOR>] [-e <CREATE_EXTERNAL_TABLE>] [-h <HIVE_HOME_DIR>] [-m <NUM_OF_MAPPERS>] [-n <CHANGES_THRESHOLD>] [-i <TIME_THRESHOLD>] [-r] [-d] [-v] [--help] [--version]
Options
Example
[user@host bin]$ ./conn_snapshot.sh -t Schema.Table -s ';'
|
Use Cases |
|
|
Take a copy of the Oracle table Shema.Table for replication over HDFS and / or HBase as per the settings in conn_setup.sh |
conn_snapshot.sh -t Schema.Table -s ';' |
|
As above and ...
|
conn_snapshot.sh -t Schema.Table -s ';' -e true |
|
Take a copy of the Oracle table Shema.Table for replication over HBase. Do not replicate over HDFS. |
conn_snapshot.sh -t Schema.Table -s ';' -r |
|
Take a copy of the Oracle table Shema.Table for replication over HDFS. Do not replicate over HBase. Replication is set for every 20 minutes or 100 changes - whichever comes first. |
conn_snapshot.sh -t Schema.Table -s ';' –d –i 20 –n 100 |
Use this script to start capturing the change data for insert, update, delete and before update operations on HDFS for a specified table name.
Note: For conn_cdc script to run, change data capture feature must be enabled in setup script. Capturing change data for a table will start only when user executes conn_cdc script for that table.
Shell Script Usage
conn_cdc.sh -t <TABLE_OWNER.TABLE_NAME> [-i] [-u] [-d] [-b] [--help] [--version]
Options
| Parameter |
Description |
|---|---|
|
-t <TABLE_OWNER.TABLE_NAME> |
Name and owner of the table to import from Oracle. Required. |
| -i | If specified, enables change data capture for insert operation. |
| -u | If specified, enables change data capture for update operation. |
| -d | If specified, enables change data capture for delete operation. |
| -b | If specified, enables change data capture for before update operation values. |
|
--help |
Show this help and exit. |
|
--version |
Show version information and exit. |
Example
[user@host bin]$ ./conn_cdc.sh -t Schema.Table
|
Use Cases |
|
| If none of these parameters [-i] [-u] [-d] [-b] are provided, then all captured changes (insert, update and delete) will be recorded by default. Before updates will not be recorded. | conn_cdc.sh -t Schema.Table |
| Capture change data for only insert operation. Rest of the operations will not be recorded. | conn_cdc.sh -t Schema.Table -i |
| Capture change data for only update and delete operation. Insert and before update operation will not be recorded. | conn_cdc.sh -t Schema.Table –u -d |
| Capture change data for update and before update operation. Rest of the operations will not be recorded. | conn_cdc.sh -t Schema.Table –u -b |
Note: Execute conn_cdc script again for the same table if you want to change the preference of capturing different operations.
Accessing captured change data (CDC) data written on HDFS
Change data written on HDFS for a particular table can be accessed using two ways as mentioned below.
Hive external tables pointing to CDC data on HDFS.
SharePlex Connector for Hadoop creates external tables in Hive for CDC feature. External Hive table(s) gets created under Hive database same as Oracle table owner / schema. Each external Hive table points to data stored for that table on CDC HDFS destination directory.
SharePlex Connector for Hadoop does the mapping of Oracle data types to appropriate data types in external Hive table as shown below. This allows user to make use of built-in Hive UDFs available with the specific Hive data types.
| Oracle data type | Hive data type |
|---|---|
| CHAR | STRING |
| VARCHAR2 | STRING |
| NCHAR | STRING |
| NVARCHAR2 | STRING |
| NUMBER | DOUBLE |
| FLOAT | DOUBLE |
| DATE | TIMESTAMP |
| TIMESTAMP | TIMESTAMP |
| TIMESTAMP WITH TIME ZONE | STRING |
| TIMESTAMP WITH LOCAL TIME ZONE | STRING |
| INTERVAL YEAR TO MONTH | STRING |
| INTERVAL DAY TO SECOND | STRING |
| ROWID | STRING |
Note: If any of Oracle table’s column data type is modified, Hive data type for that particular column is mapped to STRING.
Custom Map-Reduce programs
SharePlex Connector for Hadoop generates change-data-records (Java utility classes) per CDC table using which custom Map-Reduce programs can read CDC data stored on HDFS. Change a data record (utility classes) gets generated under “lib” directory as well as these classes gets added in lib\change-data-records.jar file.
Use this script to start / restart / shutdown SharePlex Connector for Hadoop and reload SharePlex Connector for Hadoop configurations.
Shell Script Usage
conn_ctrl.sh [start|restart|shutdown|reload-conf|--help|--version|--status]
Options
| Parameter |
Description |
|---|---|
|
start |
Start SharePlex Connector for Hadoop. |
|
restart |
Stop and start SharePlex Connector for Hadoop. |
|
shutdown |
Stop SharePlex Connector for Hadoop. |
|
reload-conf |
Reload some of the SharePlex Connector for Hadoop configurations without restarting SharePlex Connector for Hadoop. These configurations include
|
|
--help |
Show this help and exit. |
|
--version |
Show version information and exit. |
| --status | Show the status (started / shutdown) of SharePlex Connector for Hadoop. |
Example
[user@host bin]$ ./conn_ctrl.sh