Chat now with support
Chat with Support

SharePlex Connector for Hadoop 8.5.6 - Installation Guide

conn_snapshot.sh

conn_snapshot.sh

Use this script to take a fresh copy of an Oracle table for replication.

Shell Script Usage

conn_snapshot.sh -t <TABLE_OWNER.TABLE_NAME> [-f <FILE_TYPE>] [-s <FIELD_SEPARATOR>] [-e <CREATE_EXTERNAL_TABLE>] [-h <HIVE_HOME_DIR>] [-m <NUM_OF_MAPPERS>] [-n <CHANGES_THRESHOLD>] [-i <TIME_THRESHOLD>] [-r] [-d] [-v] [--partition-key <LIST_OF_COLUMNS>] [--compression-codec <CODEC_NAME>] [--help] [--version]

Options

Parameter

Description

-t <TABLE_OWNER.TABLE_NAME>

Name and owner of the table to import from Oracle. Required.

-f <FILE_TYPE>

Applicable to HDFS Near Real Time Replication.

File type for import. [Text|Sequence|Avro] (Default = Text. Use -f Sequence for sequence file type and –f Avro for Avro file type.)

-f Sequence

All data is replicated in the Sequence files on the HDFS. To read/write to the Sequence files you need access to the Writable classes used to generate the Sequence files. See the SharePlex Connector for Hadoop lib/sqoop-records directory and SharePlex Connector for Hadoop lib/sqoop-records.jar.

-s <FIELD_SEPARATOR>

The separator between each field/column. The separator must be enclosed in single quotes.

-e <CREATE_EXTERNAL_TABLE>

Copy to Hive. [true|false] (Default = false. Use -e true to enable.)

If true then

  • Copy to Hive over HDFS if a copy is taken for HDFS Near Real Time Replication. Enabling the creation of an external table in Hive over HDFS informs the connector to maintain two versions of the replicated data on HDFS. Versioned data ensures that HDFS replication does not interfere with Hive queries, and merging operations (Map jobs) work without any side effects on Hive queries.
  • Copy to Hive over HBase if a copy is taken for HBase Real Time Replication.

-h <HIVE_HOME_DIR>

Path to the Hive home directory.

If not specified the value of the HIVE_HOME environment variable is used. If this option is not set and the HIVE_HOME environment variable is also not set, this parameter will be set as relative to HADOOP_HOME.

-m <NUM_OF_MAPPERS>

The number of mappers to be used.

-n <CHANGES_THRESHOLD>

Use to override the default setting for how often SharePlex Connector for Hadoop replicates the table (measured by the number of changes to the table).

  • Applicable to HDFS Near Real Time Replication.
  • The default setting was set in conn_setup.sh.
  • Replication is executed on the first condition met: on the given number of changes to the table or the set time period -i, whichever comes first.
  • SharePlex Connector for Hadoop remembers this setting and makes use of it during further execution of conn_snapshot.

-i <TIME_THRESHOLD>

Use to override the default setting for how often SharePlex Connector for Hadoop replicates the table (measured by the number of minutes).

  • Applicable to HDFS Near Real Time Replication.
  • The default setting was set in conn_setup.sh.
  • Replication is executed on the first condition met: on the given number of changes to the table -n or the set time period, whichever comes first.
  • SharePlex Connector for Hadoop remembers this setting and makes use of it during further execution of conn_snapshot.

-r -d

Use to override the settings in conn_setup.sh. If not specified, replicate this Oracle table using HBASE and/or HDFS as per the settings in conn_setup.sh.

-r

A copy of the table is taken for HBase Real Time Replication.

Do not replicate this Oracle table using HDFS.

This overrides the settings in conn_setup.sh.

-d

A copy of the table is taken for HDFS Near Real Time Replication.

Do not replicate this Oracle table using HBASE.

This overrides the settings in conn_setup.sh.

-r -d

A copy of the table is taken for HBase Real Time Replication and HDFS Near Real Time Replication.

This overrides the settings in conn_setup.sh.

SharePlex Connector for Hadoop remembers these settings and makes use of them during further execution of conn_snapshot.

-v

Verbose - Show detailed information of each step.

--partition-key <LIST_OF_COLUMNS> Applicable to HDFS Near Real Time Replication. Use to provide a list of columns (along with a range if required) on which table data should be partitioned on HDFS. The order of columns in this list specifies the directory structure on HDFS.
--compression-codec <CODEC_NAME> Applicable to HDFS Near Real Time Replication for Avro file format. Use to provide the compression codec (snappy or deflate) to be used for the Avro file format. Currently only snappy and deflate compression codec are supported for the Avro file format. The use of compression codec may degrade the performance of a Hive query.

--help

Show this help and exit.

--version

Show version information and exit.

Example

[user@host bin]$ ./conn_snapshot.sh -t Schema.Table -s ';'

Use Cases

 

Take a copy of the Oracle table Shema.Table for replication over HDFS and / or HBase as per the settings in conn_setup.sh

conn_snapshot.sh -t Schema.Table -s ';'

As above and ...

  • Copy to Hive over HDFS if a copy is taken for HDFS Near Real Time Replication.
  • Copy to Hive over HBase if a copy is taken for HBase Real Time Replication.

conn_snapshot.sh -t Schema.Table -s ';' -e true

Take a copy of the Oracle table Shema.Table for replication over HBase. Do not replicate over HDFS.

conn_snapshot.sh -t Schema.Table -s ';' -r

Take a copy of the Oracle table Shema.Table for replication over HDFS. Do not replicate over HBase. Replication is set for every 20 minutes or 100 changes - whichever comes first.

conn_snapshot.sh -t Schema.Table -s ';' –d –i 20 –n 100

conn_cdc.sh

Use this script to start capturing change history data for insert, update, delete and before update operations on HDFS for a specified table name.

Note: For conn_cdc script to run, the change history feature must be enabled in the setup script. Capturing history data for a table will start only when the user executes the conn_cdc script for that table.

Shell Script Usage

conn_cdc.sh -t <TABLE_OWNER.TABLE_NAME> [-i] [-u] [-d] [-b] [--help] [--version]

Options

Parameter

Description

-t <TABLE_OWNER.TABLE_NAME>

Name and owner of the table to import from Oracle. Required.

-i If specified, enables change history support for insert operations.
-u If specified, enables change history support for update operation.
-d If specified, enables change history support for delete operation.
-b If specified, enables change history support for before update operation values.

--help

Show this help and exit.

--version

Show version information and exit.

Example

[user@host bin]$ ./conn_cdc.sh -t Schema.Table

Use Cases

 
If none of these parameters [-i] [-u] [-d] [-b] are provided, then all captured changes (insert, update and delete) will be recorded by default. Before updates will not be recorded. conn_cdc.sh -t Schema.Table
Capture change data for only insert operation. Rest of the operations will not be recorded. conn_cdc.sh -t Schema.Table -i
Capture change data for only update and delete operation. Insert and before update operation will not be recorded. conn_cdc.sh -t Schema.Table –u -d
Capture change data for update and before update operation. Rest of the operations will not be recorded. conn_cdc.sh -t Schema.Table –u -b

Note: Execute conn_cdc script again for the same table if you want to change the preference of capturing different operations.

 

Accessing captured history data written on HDFS

History data written on HDFS for a particular table can be accessed using two ways as mentioned below.

  1. Hive external tables pointing to history data on HDFS.

    SharePlex Connector for Hadoop creates external tables in Hive for CDC feature. External Hive table(s) gets created under Hive database same as Oracle table owner / schema. Each external Hive table points to data stored for that table on change history HDFS destination directory.

    SharePlex Connector for Hadoop does the mapping of Oracle data types to appropriate data types in external Hive table as shown below. This allows user to make use of built-in Hive UDFs available with the specific Hive data types.

    Oracle data type Hive data type
    CHAR STRING
    VARCHAR2 STRING
    NCHAR STRING
    NVARCHAR2 STRING
    NUMBER DOUBLE
    FLOAT DOUBLE
    DATE TIMESTAMP
    TIMESTAMP TIMESTAMP
    TIMESTAMP WITH TIME ZONE STRING
    TIMESTAMP WITH LOCAL TIME ZONE STRING
    INTERVAL YEAR TO MONTH STRING
    INTERVAL DAY TO SECOND STRING
    ROWID STRING

    Note: If any of Oracle table’s column data type is modified, Hive data type for that particular column is mapped to STRING.

  2. Custom Map-Reduce programs

    SharePlex Connector for Hadoop generates change-data-records (Java utility classes) per table using which custom Map-Reduce programs can read history data stored on HDFS. Change a data record (utility classes) gets generated under “lib” directory as well as these classes gets added in lib\change-data-records.jar file.

conn_ctrl.sh

conn_ctrl.sh

Use this script to start / restart / shutdown SharePlex Connector for Hadoop and reload SharePlex Connector for Hadoop configurations.

Shell Script Usage

conn_ctrl.sh [start|restart|shutdown|reload-conf|--help|--version|--status]

Options

Parameter

Description

start

Start SharePlex Connector for Hadoop.

restart

Stop and start SharePlex Connector for Hadoop.

shutdown

Stop SharePlex Connector for Hadoop.

reload-conf

Reload some of the SharePlex Connector for Hadoop configurations without restarting SharePlex Connector for Hadoop. These configurations include

  • HDFS Near Real Time Replication changes and time interval.

--help

Show this help and exit.

--version

Show version information and exit.

--status Show the status (started / shutdown) of SharePlex Connector for Hadoop.

Example

[user@host bin]$ ./conn_ctrl.sh

conn_monitor.sh

conn_monitor.sh

Use this script to show a list of the tables being replicated. Monitor the status of a table under replication.

Shell Script Usage

conn_monitor.sh [--summary|--status <TABLE_OWNER.TABLE_NAME> [--hdfs|--hbase|--cdc]|--help|--version]

Options

Parameter

Description

--summary

Show a summary of the tables under replication.

--status <TABLE_OWNER.TABLE_NAME> [--hdfs|--hbase|--cdc]

Show the status of the given table.

Include the --hdfs option (optional) to limit the display to HDFS Near Real Time Replication.

Include the --hbase option (optional) to limit the display to HBase Real Time Replication.

Include the --cdc option (optional) to limit the display to the change history (formerly Change Data Capture) feature.

--help

Show this help and exit.

--version

Show version information and exit.

Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating