Transform and Cleanse Data
The Transformation and Cleanse tool allows you to manipulate and change data from a table, a data grid, or a SQL query. Transform and Cleanse leaves the source table intact. A sampling of data, 1000 rows or less, displays in the window.
After transforming data, you can export the data as an Excel or .csv file, save it to Local Storage, publish it to Intelligence Central, or send it to Automation. You can also save your work as a Transform and Cleanse document (.tct).
Important: Legacy Transform and Cleanse (.tdc) and Data Browser (.tdb) file formats are no longer supported in Toad Data Point 3.8 or later. Please recreate the content of legacy files using this new Transform and Cleanse tool.
To publish the transformed data to Intelligence Central as a view or snapshot, or to send the underlying query to the Editor, all applied rules must support SQL generation. Currently, SQL generation is supported for Oracle and MySQL connections only.
Note: This feature is available in the Toad Data Point Professional Edition only.
To transform and cleanse data
To send data to the Transform and Cleanse window, do one of the following:
- Right-click a data grid and select Send To | Transform and Cleanse.
- Right-click an object in the Object Explorer and select Transform and Cleanse.
- Select Tools | Transform and Cleanse and select a source for data to transform. Select Query to enter a query. The Pick a Source Dialog
- View data profiling information. In the Transform and Cleanse window, click within a column in the left pane to view data profiling information for that column in the right pane (Column tab). A similar and more in-depth data profiling interface is available in the Data Profiling tool. For an explanation of the data profiling information presented by both tools, see Data Profiling.
- Recommended rules. If applicable, Toad recommends new rules to apply based on a review of the data. If Toad has a suggestion, it is displayed in the Datatype section in the Column tab, and the Auto Transform button in the toolbar is enabled. Auto Transform - Rules Recommended by Toad
- Show/Hide Columns. Use the Show/Hide Columns pane to select which columns to display in the data grid.
- To create a new transform and cleanse rule, right-click the column in which you want to transform data and select the category of rule to create. The lower pane opens displaying the rule editor.
Use the rule editor to define a new rule. Review the following for additional information:
|Find and Replace
Find—Enter or select the value you want to replace.
- Load All Distinct Values—(Enabled for reduced sample sizes) Click to populate the Find drop-down list with all distinct values, regardless of the sample size you are using.
- Null Display Settings—(Enabled if Null is selected) Click to open the Environment-Grid page of the Options dialog to specify how null values are displayed in the grid throughout Toad.
Case Sensitive—Select to perform a case-sensitive search for the value.
Replace—Enter the value you want to use (replacement value).
Build a calculated column in the expression editor.
- To add a calculated column, enter a new column name.
- To replace the existing column, use the default name.
Use SQL Transforms
- Select this option to use functions supported by the database (for the current connection). This also allows Toad to generate SQL for this Calculated Column rule.
Do not select this option if you want to use functions evaluated by Toad internally. Transformation Functions If you do not select this option, the rule cannot support SQL generation.
Note: Rules that do not support SQL generation appear in blue text in the Steps tab.
In the expression editor, create the expression to use to calculate the column. Select items in the left pane to display options in the right pane. Double-click an option to add it to the editor. Single-click an operator to add it to the editor. Use the Search box to filter available options.
Using Internal Toad Functions in Expression Editor
If you create an expression without selecting the Use SQL Transforms option, the following apply:
- Strings must be enclosed in single quotes.
- If using a single quote (') or a backslash (\) as input in a constant, you must escape the character using a backslash (\), for example \' or \\.
- If using a carriage return (new line) as input, you must use one of the following formats:
- Windows: "\r\n"
- Macintosh, Unix, Linux: "\n"
- Macintosh (older OS): "\r"
- When using the ToChar date or numeric function, you must use .Net format strings. Transformation Functions
Tip: To quickly filter a column, click in the row below the column header and enter a value.
Create a filter condition by selecting a column, an operator, and a value. See Filter Data in Transform and Cleanse for more in-depth instructions.
- Click to add another condition.
- To select a different Boolean operator, click And.
- Click to add a group or to clear all conditions.
Note: Click to the right of a condition statement to add another value to a list of values, for example, when using an operator such as Is any of.
Load All Distinct Values—(Enabled for reduced sample sizes) Click to load all distinct values into the drop-down list of values.
Preview—Click to preview filtered data in upper pane.
Clear—Click to clear the filter and the rule editor.
(Numeric and datetime columns only)
Format—(Numeric) Select a numeric type.
Format Options—(Datetime) Select one of the date-only, date/time, or time-only formats.
Use English format instead of regional language setting—Select to quickly switch to English format options if you have a non-English regional setting.
Custom—(Datetime) Select to create a custom date/time format.
Caution: If you apply a Format rule to a numeric or datetime column, the column is converted to a String data type in the transformed data.
Select the column or columns in which to find duplicates.
Then for each column, select the method to use to match duplicates.
- Rounded—(Numeric) Select to round numeric values to the specified number of decimal places before matching.
- Significant—(Numeric) Select to match values using the specified number of significant digits.
- Fuzzy—(String) Select to use a slightly-modified, double metaphone algorithm where each word is passed through separately. This method only works well for English language text.
Filter To: Select the method of filtering and displaying duplicates.
- Distinct Rows—Removes all duplicates, except one row from each duplicate set. Duplicate rows display in the Duplicates tab.
- Unique Rows—Removes all duplicates. Duplicate rows display in the Duplicates tab. Only rows without duplicates remain in the upper pane.
- Duplicate Rows—Duplicate rows display in the upper pane. Rows without duplicates are removed.
The Duplicates tab opens after the rule is applied if the Distinct Rows or Unique Rows filtering method is selected.
Export Duplicates Separately—Select to export the duplicates when you export the transformed data. Then specify a file for the duplicates. When you click the Export button in the Wizard bar to export the transformed data to a .csv file, the duplicates are saved to the separate .csv file that you specified.
Specify how to trim values.
Simplify Consecutive Spaces—Select to convert one or more consecutive Unicode whitespace characters (like tab, carriage return, space) into one ASCII 32 (space) character.
Remove Control Characters—Select to remove ASCII control characters 0–31 (non-printing characters).
Select the data type to which you want to convert the column.
Round value to nearest integer instead of truncating—Select to instruct Toad to round values when converting to an integer data type.
The Problematic Rows tab opens if the data contains values that could not be converted. The problematic rows display in the tab and the problematic values are set to null in the converted column in the transformed data.
Use this rule to group rows into categories or ranges using a group column.
- Create a group condition by selecting an operator and values or by selecting a range. Enter a new value (or label) for the new group in the Value is field.
- Load All Distinct Values—(Enabled for reduced sample sizes) Click to load all distinct values into the drop-down list of values.
- In the Otherwise Value is field, do one of the following:
- Enter a default value to create a default group.
- Select the column name to use each row's current value as the default value.
- Click to add another condition to the rule.
- When you apply the rule, a group column is added to the transformed data.
Use this rule to rename columns. You can rename one or more columns using a single rule.
Enter a new name for each column you want to rename. If SQL generation is supported, the new name is applied as an alias.
Tip: Place cursor over column name in grid to display original name.
Remove one or more columns from the transformed data set.
- Clear the check box for each column you want to remove.
- Remove Hidden Columns—Select to remove the columns that are hidden in the Show/Hide Columns pane at the time you create the rule.
The columns you remove are not included in the SQL statement (when SQL generation is supported).
(String columns only)
Split At—Select the method to use to split the column. Select one of the following:
Separator—Splits the column based on a separator, such as a blank space, tab, new line, or string. Toad adds suggestions to the drop-down list of separators, based on the contents of the selected column.
Select a character to use as the separator, or enter a character or string. Then select the direction from which to find the character (or string) in the string.
- At—Splits the column at the character (or string) and excludes it from the new columns.
- After—Splits the column after the separator, regardless of the direction selected.
- Before—Splits the column before the separator, regardless of the direction selected.
Position—Splits the column at a specific position in the string. This method creates two columns.
Select the number of rightmost or leftmost characters to include in the first column. Then select the direction from which to determine this number. The second column includes the remainder of the string. Whitespace characters (blank, new line, etc.) are included in the character count.
Split Into—(Separator method only) Select the number of new columns to create.
- N—Creates N new columns, regardless of the number of splits generated. This option can result in new columns that contain only NULL values.
All—Creates as many new columns as required, but not more than ten columns. With this option, all columns contain at least one not-NULL value.
In the grid, you will see only the number of new columns required for the current sample. If you change the sample size (Edit Sample), you can refresh the number of new columns displayed in the grid if necessary. Select the rule and click Update Rule in the rule editor.
- If you use the All option and then apply a rule to one of the newly-created columns, that column persists even if the data changes.
- If you use the All option and then export to .csv, ten new columns are exported, even if one or more columns contain all NULL values.
(Date/time columns only) Extract the individual date or time components from the date/time field.
Extract—Select the components to extract. An individual column is created for each component selected.
- By default, new columns contain numeric values (except columns created from the Date or Time options). This allows the values to be used in a calculated column.
Use Descriptive Names Instead of Numeric Values—Select to use names (string values) where possible, for example May instead of 5.
Note: Toad uses the same definition for Week of Month as Excel.
- Click Apply Rule to apply the rule to the data set. Toad displays the transformed data in the upper pane and adds the rule to the Steps tab in the right pane.
Steps tab. Select the Steps tab to view the list of currently-applied rules. For each rule in the Steps tab, a tool tip displays additional information. Review the following tool tip icons.
||Indicates that the rule does not support SQL generation. The rule is shown in blue in the Steps tab.|
||Indicates that the rule produces errors/exceptions in the sample data. The rule is shown in red in the Steps tab. See Exception Handling.|
- Create additional rules as necessary.
SQL tab. (Oracle and MySQL only) To review the SQL for all rules, select the SQL tab behind the rule editor (click a rule in Steps tab to display the rule editor). The SQL tab displays the SQL generated for the underlying query and all the applied rules, if all rules support SQL generation.
Rules that do not support SQL generation are shown in blue in the Steps tab for easy identification. This gives you the opportunity to modify or remove the rule if SQL generation is necessary.
Note: Currently, SQL generation is supported for Oracle and MySQL connections only. For other database types, SQL is not generated.
- To remove a rule, right-click the rule in the Steps tab and select Delete. To save the rule, add it to the Repository before deleting it.
- Transforms Repository. To add a rule to the Repository, right-click the rule in the Steps tab and select Add to Transforms Repository.
- To rename a rule in the Repository, right-click the rule and select Rename.
- To apply a rule in the Repository, select the rule in the Repository tab and click Apply Rule in the rule editor. You can also drag a rule from the Repository to the column to which you want to apply it.
- To create a new folder, click .
- Rules in the Repository persist between Toad sessions.
- You can share rules if you are using Toad Intelligence Central. See Transforms Repository.
- Add a sequence of rules to the Repository by selecting multiple rules in the Steps tab. Then right-click and select Add to Transforms Repository. The rules are added to the Repository as a group (sequence).
- To apply a sequence from the Repository, select the sequence in the Repository. Then in the rule editor, select the column to which you want to apply each rule. When finished, click Apply sequence to columns.
- To edit a rule, select the rule in the Steps tab. Modify the rule in the rule editor and then click Update Rule.
- To edit a rule or sequence in the Repository, select the rule/sequence in the Repository. Modify the rule/sequence and click .
- You can save your work as a Toad Transform and Cleanse document (.tct) by selecting File | Save File.
Use the Wizard bar to export or publish the transformed data, send the SQL to the Editor, or automate the task.
||Send SQL to Editor. This action is enabled only when all currently-applied rules are supported for SQL generation.|
Export the transformed data to a .csv or an Excel file. Specify Excel Export Options
If rules produce errors/exceptions in the data, see Exception Handling.
Save the data to local storage. Save Data to Local Storage
If rules produce errors/exceptions in the data, see Exception Handling.
Publish the transformed data to Toad Intelligence Central.
Important: To publish transformed data to Intelligence Central as a view or snapshot, all applied rules must support SQL generation. Otherwise, data is published as a dataset.
Publish to Toad Intelligence Central
If rules produce errors/exceptions in the data, see Exception Handling.
||Publish the Transform and Cleanse file to Intelligence Central.|
||Automate the data transformation task. Use Database Automation Activities|
||Save your work as a Transform and Cleanse document (.tct) and add it to the currently-open project in the Project Manager.|
- Click Edit Sample to specify sample size, modify the query, or specify error handling options. See Transform and Cleanse Options.
- Filter data by pattern. To filter the data grid by a pattern, double-click a pattern in the Top Patterns pane. Click Undo Pattern Filter to remove filter.
- New rules are given auto-generated names. You can rename a rule or sequence in the Repository. However, when it is reapplied to the data set, the name reverts back to the auto-generated name.
- When data includes high-precision numbers (i.e., those with a large number of decimal places), results from using transformation functions may not always be as expected. For best results, use numbers that have not more than 15 significant digits.
- To enable ignored messages, go to Tools | Options | Environment | Confirmations.
- Once you select Load All Distinct Values for a column, the list is cached and made available in all applicable rule editors.
Auto Transform - Rules Recommended by Toad
To help you transform and cleanse data, Toad reviews the data to be transformed and recommends rules to apply. If Toad recommends a rule for a column, when you select the column, the suggested rule is displayed in the Column tab (right pane).
Click Auto Transform in the toolbar to display the list of all the recommended rules for the data set.
Toad provides recommendations for the following rules: Trim and Convert Data Type.
To find and apply recommended rules (individually)
- Select a column in the data to be transformed. If Toad has a suggestion, it is displayed in the Datatype panel in the Column tab (right pane). The actual type and the count of the targeted values are provided (red text), and the suggested rule displays as a green button.
- To apply the rule, click the green button displaying the rule.
- To view the rule after applying it, select the Steps tab and then select the rule.
- You can then modify the rule in the rule editor, if necessary. Select the SQL tab to view the rule as SQL.
To apply all recommended rules (Auto Transform)
Click Auto Transform in the Transform and Cleanse toolbar.
Note: If there are no recommended rules to apply, the Auto Transform button is disabled.
The Auto Transform dialog opens displaying a list of recommended rules. Rules that will have the most impact are shown in green. Place the cursor over each rule to display the suggestion's justification. Select the rules you want to apply.
- Then click Auto Transform at the bottom of the dialog. Toad applies the selected rules and adds them to the Steps list.
- If you did not apply all recommended rules, you can click Auto Transform in the toolbar again to display the remaining recommended rules. Select the rules you want to apply and click Auto Transform.
Toad provides the following features to help you identify and resolve errors/exceptions created when transforming data.
Rows with Exceptions tab—If you apply (or update) a rule that produces errors/exceptions in the sample data, the Rows with Exceptions tab displays in the rule editor pane. The tab shows the rows containing exceptions (from the sample data only) and provides a description of the error.
Error Zip File—When you attempt to export or publish transformed data that contains rows with exceptions, Toad generates an Error zip file (compressed folder). The zip contains one .csv file for each exception-generating rule. Each rule file lists the exception rows generated by that rule. When export/publishing is finished, a dialog opens providing links to the Error zip file and the individual rule files.
Review the following information about exception handling:
- Rules that generate errors are shown in red in the Steps tab for easy identification.
- Toad generates the Error zip file when necessary after the following export actions: export to .csv or Excel, save to Local Storage, or publish data to Intelligence Central.
- The Error files are automatically saved to the location specified in the Transform and Cleanse Options dialog. Click Edit Sample to open this dialog. Transform and Cleanse Options
Considerations and Limitations when Transforming Data