Toad Data Point 6.1.2 - Release Notes

New Features System Requirements Resolved Issues and Enhancements Known Issues Third Party Known Issues Getting Started Product Licensing Globalization About Us Copyright

Data Profiling

Data Profiling allows you to inspect your data to assess its content and data quality. Use Data Profiling to find duplicates and nulls, to identify anomalies and patterns, and to view statistics about your data. Graphs and charts help you visualize data quality. Data Profiling can help you identify data quality issues prior to ETL processing.

In the Data Profiling window, use the main tabs to view summaries in each profiling category. Then click links to drill down to more-detailed information. After assessing the quality of your data, you can generate and save a Data Profiling report.

Use Data Profiling to perform the following types of analyses:

Uniqueness analysis—Find duplicates
Completeness analysis—Find missing and null values
Value distribution—Easily view data distribution patterns
Pattern analysis—Find/match patterns and anomalies
Range analysis—Find top and bottom values

Note: Data profiling using the Data Profiling module is available in the Toad Data Point Professional Edition only.

To profile data

Use one of the following methods to send data to the Data Profiling window:
- Right-click a data grid and select Send To | Data Profiling.
- Right-click an object in the Object Explorer and select Data Profiling.
- Select Tools | Profile and select a source from which to profile data. Select Query to enter a query. The Pick a Source Dialog
In the Data Profiling window, view profiling information by selecting one of the categories from among the tabs.

Then select a data column in the Column list to view profiling information for that column. To display data for a specific value or group, double-click a row in one of the tables in the right pane or click a block/bar in a graphic. Review the following for additional information:

Summary Tab
Statistics grid	The top pane provides statistical summaries. Click a column header to sort by that column (sorts statistics grid and bar graph).
Bar graph	For each data column, the bar graph displays the amount of unique/non-unique data as percentages. Hover over a bar to view values. Click a bar to display the selected data in the Selected tab (bottom pane).
Legend	The Legend provides a description for each column in the statistics grid (top pane). Click a column name to sort the statistics grid by that column (also sorts bar graph).
All Data Tab	Displays the data.
Selected Tab	Displays the selected data. The type of data selected and the column name appear above the grid. To specify the number of rows to display, click Edit Profile.
Statistics Tab	Allows you to view statistics for each data column.
Column Pane	Select a column name to view statistics for that column.
Uniqueness Percentages	A visual representation shows the amount (as percentages) of unique and non-unique data. Click a block to display the selected data in that group (displays in the Selected tab, bottom pane). Review the following definitions to learn what each group is comprised of: Populated—Rows with real values (excludes null and missing values) Distinct—This group includes one occurrence of all real values Unique—Within the distinct group, the values that occur only once Non-Unique—Within the distinct group, the values that occur more than once (but only one occurrence of the value is counted here) Non-unique + unique = distinct Repeated Rows—For values that occur more than once, this group includes all subsequent occurrences of the values not included in non-unique Duplicates—For values that occur more than once, this group includes all occurrences of the values Missing—Rows with blank, missing, or white space values Null—Rows with the value "Null" All Data—The total number of rows analyzed Note: Clicking Non-Unique or Repeated Rows displays Duplicates.
Value Distribution	A bar graph displays the distribution of values. The first 20 values are shown. Others—Select this option to add a bar containing the remaining values to the graph. For numeric data, you can select to overlay the bar graph with Statistics, Quartiles, and Percentiles. Click a bar in the graph to display the selected data in the Selected tab. Note: Dates are shown using the best date/time format for date distribution.
Value Summary	Displays the number (and percentage) of rows in each uniqueness group. Duplicates—The first and all instances of a row that is duplicated (non-unique + repeated rows) Double-click a row to display the selected data in the Selected tab.
Statistics	Provides statistical analysis of values. Double-click Min or Max to display the rows matching the selected value.
Percentiles	View how data is distributed across: Grouped Frequency Distribution Percentiles Values Distribution Double-click a value in Values Distribution or Grouped Frequency Distribution to see original data.
Frequency Tab	View how data is distributed. Find value distribution patterns or trends within data. Double-click a table row to display selected data.
Column Pane	Select a column name to view frequency information for that data column.
Top Values	Lists the most-frequently occurring values.
Bottom Values	Lists the least-frequently occurring values.
First Values	Lists the first populated values in the table.
Last Values	Lists the last populated values in the table.
Patterns Tab	Identifies and lists patterns in data for string fields. Provides the count (and percentages) of values that match each pattern. Double-click a pattern to filter by that pattern and display results in the Selected tab (Profiling) or data grid (Transform and Cleanse). (Transform and Cleanse) Click Undo Pattern Filter to remove the filter. Use the percentage (%) column to find the most frequently occurring pattern.
Word Patterns	Identifies and lists all word patterns in the data.
Letter Pattern	Identifies and lists all letter patterns in the data (collectively, for all identified word patterns).
Identified Domain	Toad automatically identifies a domain, and identifies and lists patterns based on that domain. Domains that Toad identifies include: Email, URL, IP Address, US Phone Number, US Zipcode, US Address, US Company Name, and US States.
Language Tab	Provides language analysis per character for string fields. Double-click a table row to display selected data.
Character Language Distribution	Identifies and lists languages found in the data.
ASCII Character Distribution	Identifies and lists ASCII characters found in the data.
Duplicates Tab	Allows you to find duplicates. Select the checkbox for each column you want to include in the search. Click Check Duplicates.
Show/Hide Options	Click Show Options to specify options for this search. String Comparison—Select a method for comparing string values. Fuzzy—Uses a slightly-modified, double metaphone algorithm where each word is passed through separately. This method only works well for English language text.

You can modify the profiling options and then re-profile the data.
1. Click Edit Profile to open the Profiling Options dialog. Profiling Options
2. Specify options to apply to this profiling session. Click Profile Now.
To modify the data to profile, select Edit Profile and modify the query (Query tab). Click Profile Now.

To profile data within the Editor Window

Select Tools | Edit | SQL Editor.
Enter a query in the Editor and click Run SQL.
Select the Profiling tab after the SQL executes. Toad displays statistical summaries similar to the Summary tab in the Data Profiling module.
Click Full Profiling to send the data to the Data Profiling module.

To export a Data Profiling report

After profiling data, click Report in the Wizard bar. The report displays in a preview window.
Click the arrow beside and select an output format.
Specify export file options and click OK.
Select a file name and location in the Save As dialog.

Considerations and Limitations

Sampling is only supported for Oracle®, IBM® DB2®, SQL Server®, and MySQL databases.

Consideration/Limitation	Description
Binary columns are excluded	Data profiling excludes binary columns or any other data type that is not comparable.
Support for server-side sampling	For Oracle®, IBM® DB2®, SQL Server®, and MySQL databases, the sampling step (number of rows sampled or random sampling) is applied to the database on the server side. This is an advantage as it can reduce processing time and network load.

Tips:

You can automate data profiling using the Profile Data activity. Use Database Automation Activities
Click to maximize a pane for better viewing.

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

Toad Data Point 6.1.2 - Release Notes

Data Profiling

Considerations and Limitations