Chat now with support
Chat mit Support

SharePlex Connector for Hadoop 8.5.5 - SharePlex Connector for Hadoop Installation Guide

Start and Stop the Derby Network Server

Use Case: Start and Stop the Derby Network Server

SharePlex Connector for Hadoop installs Apache Derby and starts the Derby Network Server.

Set the DERBY_HOME environment variable. Change the current working directory to the SharePlex Connector for Hadoop home, bin and execute $ source shareplex_hadoop_env.sh.

To start Derby, execute:

$ java -jar $DERBY_HOME/lib/derbynet.jar start –p <PORT_NUM>

To stop Derby, execute:

$ java -jar $DERBY_HOME/lib/derbynet.jar shutdown –p <PORT_NUM>

Ensure you enter PORT_NUM as it was set when you ran install.sh. The default is 1527. To look up the PORT_NUM you set, in SharePlex Connector for Hadoop home, conf, open the configuration file (by default connectorConfiguration.xml) and look up the port number entry:

<entry key="derbyPort">1527</entry>

Command Reference

conn_setup.sh

conn_setup.sh

Use this script to setup SharePlex Connector for Hadoop and provide the necessary configuration details. Setup is usually a one-time activity. This script is similar to ora_setup of SharePlex for Oracle.

Shell Script Usage

[user@host bin]$ ./conn_setup.sh [-c <INPUT_FILE>] [-r] [-n][-d][--help] [--version]

Options

Parameter

Description

-c <INPUT_FILE>

Provide an input file that defines the configuration parameters with their values.

Each configuration parameter should be specified on a new line of the input file.

Format - Key=Value. Example – enableRealTimeCopy=True

You may like to use this parameter to modify your original configuration. When conn_setup.sh is run without parameters the file it creates is conf/connectorConfiguration.xml

There is an example input file located at conf/conn_setup_template.properties

-r

Use to update the configuration

  • Add/Update parameters for HBase Real Time Replication: the HBase column family name.
  • Tables will now replicate to HBase Real Time Replication unless otherwise specified in conn_snapshot.sh

-n

Use to update the configuration

  • Add/Update parameters for HDFS Near Real Time Replication: add the HDFS destination directory and how often do you want to copy data to HDFS? This is measured by time and number of changes.
  • Tables will now replicate to HDFS Near Real Time Replication unless otherwise specified in conn_snapshot.sh
-d

Use to update the configuration

  • Add/Update parameters for Change Data Capture (CDC Feature): add the CDC destination directory.

--help

Show this help and exit.

--version

Show version information and exit.

Note: The user must restart the connector after enabling any of the features using –r/-n/-d option.

Configuration Parameters

If conn_setup.sh is run without an input file then you will be prompted to supply a value to each of the configuration parameters. For more information, see Run conn_setup.sh.

Default values are provided within brackets. Press Enter to select the default value.

conn_snapshot.sh

conn_snapshot.sh

Use this script to take a fresh copy of an Oracle table for replication.

Shell Script Usage

conn_snapshot.sh -t <TABLE_OWNER.TABLE_NAME> [-f <FILE_TYPE>] [-s <FIELD_SEPARATOR>] [-e <CREATE_EXTERNAL_TABLE>] [-h <HIVE_HOME_DIR>] [-m <NUM_OF_MAPPERS>] [-n <CHANGES_THRESHOLD>] [-i <TIME_THRESHOLD>] [-r] [-d] [-v] [--partition-key <LIST_OF_COLUMNS>] [--compression-codec <CODEC_NAME>] [--help] [--version]

Options

Parameter

Description

-t <TABLE_OWNER.TABLE_NAME>

Name and owner of the table to import from Oracle. Required.

-f <FILE_TYPE>

Applicable to HDFS Near Real Time Replication.

File type for import. [Text|Sequence|Avro] (Default = Text. Use -f Sequence for sequence file type and –f Avro for Avro file type.)

-f Sequence

All data is replicated in the Sequence files on the HDFS. To read/write to the Sequence files you need access to the Writable classes used to generate the Sequence files. See the SharePlex Connector for Hadoop lib/sqoop-records directory and SharePlex Connector for Hadoop lib/sqoop-records.jar.

-s <FIELD_SEPARATOR>

The separator between each field/column. The separator must be enclosed in single quotes.

-e <CREATE_EXTERNAL_TABLE>

Copy to Hive. [true|false] (Default = false. Use -e true to enable.)

If true then

  • Copy to Hive over HDFS if a copy is taken for HDFS Near Real Time Replication.
  • Copy to Hive over HBase if a copy is taken for HBase Real Time Replication.

-h <HIVE_HOME_DIR>

Path to the Hive home directory.

If not specified the value of the HIVE_HOME environment variable is used. If this option is not set and the HIVE_HOME environment variable is also not set, this parameter will be set as relative to HADOOP_HOME.

-m <NUM_OF_MAPPERS>

The number of mappers to be used.

-n <CHANGES_THRESHOLD>

Use to override the default setting for how often SharePlex Connector for Hadoop replicates the table (measured by the number of changes to the table).

  • Applicable to HDFS Near Real Time Replication.
  • The default setting was set in conn_setup.sh.
  • Replication is executed on the first condition met: on the given number of changes to the table or the set time period -i, whichever comes first.
  • SharePlex Connector for Hadoop remembers this setting and makes use of it during further execution of conn_snapshot.

-i <TIME_THRESHOLD>

Use to override the default setting for how often SharePlex Connector for Hadoop replicates the table (measured by the number of minutes).

  • Applicable to HDFS Near Real Time Replication.
  • The default setting was set in conn_setup.sh.
  • Replication is executed on the first condition met: on the given number of changes to the table -n or the set time period, whichever comes first.
  • SharePlex Connector for Hadoop remembers this setting and makes use of it during further execution of conn_snapshot.

-r -d

Use to override the settings in conn_setup.sh. If not specified, replicate this Oracle table using HBASE and/or HDFS as per the settings in conn_setup.sh.

-r

A copy of the table is taken for HBase Real Time Replication.

Do not replicate this Oracle table using HDFS.

This overrides the settings in conn_setup.sh.

-d

A copy of the table is taken for HDFS Near Real Time Replication.

Do not replicate this Oracle table using HBASE.

This overrides the settings in conn_setup.sh.

-r -d

A copy of the table is taken for HBase Real Time Replication and HDFS Near Real Time Replication.

This overrides the settings in conn_setup.sh.

SharePlex Connector for Hadoop remembers these settings and makes use of them during further execution of conn_snapshot.

-v

Verbose - Show detailed information of each step.

--partition-key <LIST_OF_COLUMNS> Applicable to HDFS Near Real Time Replication. Use to provide a list of columns (along with a range if required) on which table data should be partitioned on HDFS. The order of columns in this list specifies the directory structure on HDFS.
--compression-codec <CODEC_NAME> Applicable to HDFS Near Real Time Replication for Avro file format. Use to provide the compression codec (snappy or deflate) to be used for the Avro file format. Currently only snappy and deflate compression codec are supported for the Avro file format. The use of compression codec may degrade the performance of a Hive query.

--help

Show this help and exit.

--version

Show version information and exit.

Example

[user@host bin]$ ./conn_snapshot.sh -t Schema.Table -s ';'

Use Cases

 

Take a copy of the Oracle table Shema.Table for replication over HDFS and / or HBase as per the settings in conn_setup.sh

conn_snapshot.sh -t Schema.Table -s ';'

As above and ...

  • Copy to Hive over HDFS if a copy is taken for HDFS Near Real Time Replication.
  • Copy to Hive over HBase if a copy is taken for HBase Real Time Replication.

conn_snapshot.sh -t Schema.Table -s ';' -e true

Take a copy of the Oracle table Shema.Table for replication over HBase. Do not replicate over HDFS.

conn_snapshot.sh -t Schema.Table -s ';' -r

Take a copy of the Oracle table Shema.Table for replication over HDFS. Do not replicate over HBase. Replication is set for every 20 minutes or 100 changes - whichever comes first.

conn_snapshot.sh -t Schema.Table -s ';' –d –i 20 –n 100

Verwandte Dokumente

The document was helpful.

Bewertung auswählen

I easily found the information I needed.

Bewertung auswählen