Use this script to start capturing the change data for insert, update, delete and before update operations on HDFS for a specified table name.
Note: For conn_cdc script to run, change data capture feature must be enabled in setup script. Capturing change data for a table will start only when user executes conn_cdc script for that table.
Shell Script Usage
conn_cdc.sh -t <TABLE_OWNER.TABLE_NAME> [-i] [-u] [-d] [-b] [--help] [--version]
Options
| Parameter |
Description |
|---|---|
|
-t <TABLE_OWNER.TABLE_NAME> |
Name and owner of the table to import from Oracle. Required. |
| -i | If specified, enables change data capture for insert operation. |
| -u | If specified, enables change data capture for update operation. |
| -d | If specified, enables change data capture for delete operation. |
| -b | If specified, enables change data capture for before update operation values. |
|
--help |
Show this help and exit. |
|
--version |
Show version information and exit. |
Example
[user@host bin]$ ./conn_cdc.sh -t Schema.Table
|
Use Cases |
|
| If none of these parameters [-i] [-u] [-d] [-b] are provided, then all captured changes (insert, update and delete) will be recorded by default. Before updates will not be recorded. | conn_cdc.sh -t Schema.Table |
| Capture change data for only insert operation. Rest of the operations will not be recorded. | conn_cdc.sh -t Schema.Table -i |
| Capture change data for only update and delete operation. Insert and before update operation will not be recorded. | conn_cdc.sh -t Schema.Table –u -d |
| Capture change data for update and before update operation. Rest of the operations will not be recorded. | conn_cdc.sh -t Schema.Table –u -b |
Note: Execute conn_cdc script again for the same table if you want to change the preference of capturing different operations.
Accessing captured change data (CDC) data written on HDFS
Change data written on HDFS for a particular table can be accessed using two ways as mentioned below.
Hive external tables pointing to CDC data on HDFS.
SharePlex Connector for Hadoop creates external tables in Hive for CDC feature. External Hive table(s) gets created under Hive database same as Oracle table owner / schema. Each external Hive table points to data stored for that table on CDC HDFS destination directory.
SharePlex Connector for Hadoop does the mapping of Oracle data types to appropriate data types in external Hive table as shown below. This allows user to make use of built-in Hive UDFs available with the specific Hive data types.
| Oracle data type | Hive data type |
|---|---|
| CHAR | STRING |
| VARCHAR2 | STRING |
| NCHAR | STRING |
| NVARCHAR2 | STRING |
| NUMBER | DOUBLE |
| FLOAT | DOUBLE |
| DATE | TIMESTAMP |
| TIMESTAMP | TIMESTAMP |
| TIMESTAMP WITH TIME ZONE | STRING |
| TIMESTAMP WITH LOCAL TIME ZONE | STRING |
| INTERVAL YEAR TO MONTH | STRING |
| INTERVAL DAY TO SECOND | STRING |
| ROWID | STRING |
Note: If any of Oracle table’s column data type is modified, Hive data type for that particular column is mapped to STRING.
Custom Map-Reduce programs
SharePlex Connector for Hadoop generates change-data-records (Java utility classes) per CDC table using which custom Map-Reduce programs can read CDC data stored on HDFS. Change a data record (utility classes) gets generated under “lib” directory as well as these classes gets added in lib\change-data-records.jar file.
Use this script to start / restart / shutdown SharePlex Connector for Hadoop and reload SharePlex Connector for Hadoop configurations.
Shell Script Usage
conn_ctrl.sh [start|restart|shutdown|reload-conf|--help|--version|--status]
Options
| Parameter |
Description |
|---|---|
|
start |
Start SharePlex Connector for Hadoop. |
|
restart |
Stop and start SharePlex Connector for Hadoop. |
|
shutdown |
Stop SharePlex Connector for Hadoop. |
|
reload-conf |
Reload some of the SharePlex Connector for Hadoop configurations without restarting SharePlex Connector for Hadoop. These configurations include
|
|
--help |
Show this help and exit. |
|
--version |
Show version information and exit. |
| --status | Show the status (started / shutdown) of SharePlex Connector for Hadoop. |
Example
[user@host bin]$ ./conn_ctrl.sh
Use this script to show a list of the tables being replicated. Monitor the status of a table under replication.
Shell Script Usage
conn_monitor.sh [--summary|--status <TABLE_OWNER.TABLE_NAME> [--hdfs|--hbase|--cdc]|--help|--version]
Options
| Parameter |
Description |
|---|---|
|
--summary |
Show a summary of the tables under replication. |
|
--status <TABLE_OWNER.TABLE_NAME> [--hdfs|--hbase|--cdc] |
Show the status of the given table. Include the --hdfs option (optional) to limit the display to HDFS Near Real Time Replication. Include the --hbase option (optional) to limit the display to HBase Real Time Replication. Include the --cdc option (optional) to limit the display to Change Data Capture (CDC) feature. |
|
--help |
Show this help and exit. |
|
--version |
Show version information and exit. |
Use this script to delete all data (HDFS, HBase and Hive) from a table under replication by SharePlex Connector for Hadoop.
Shell Script Usage
conn_cleanup.sh -t <TABLE_OWNER.TABLE_NAME> [-h <HIVE_HOME_DIR>] [--help] [--version]
Options
| Parameter |
Description |
|---|---|
|
-t <TABLE_OWNER.TABLE_NAME> |
Name and owner of the table data to cleanup. |
|
-h <HIVE_HOME_DIR> |
Path to the Hive home directory. If not specified the value of the HIVE_HOME environment variable is used. If this option is not set and HIVE_HOME environment variable is also not set, this parameter will be set as relative to HADOOP_HOME. |
|
--help |
Show this help and exit. |
|
--version |
Show version information and exit. |
Example
[user@host bin]$ ./conn_cleanup.sh -t Schema.Table