Chat now with support
Chat with Support

Toad Data Point 6.1.2 - Release Notes

Data Profiling

Data Profiling allows you to inspect your data to assess its content and data quality. Use Data Profiling to find duplicates and nulls, to identify anomalies and patterns, and to view statistics about your data. Graphs and charts help you visualize data quality. Data Profiling can help you identify data quality issues prior to ETL processing.

In the Data Profiling window, use the main tabs to view summaries in each profiling category. Then click links to drill down to more-detailed information. After assessing the quality of your data, you can generate and save a Data Profiling report.

Use Data Profiling to perform the following types of analyses:

  • Uniqueness analysis—Find duplicates
  • Completeness analysis—Find missing and null values
  • Value distribution—Easily view data distribution patterns
  • Pattern analysis—Find/match patterns and anomalies
  • Range analysis—Find top and bottom values

Note: Data profiling using the Data Profiling module is available in the Toad Data Point Professional Edition only.

To profile data

  1. Use one of the following methods to send data to the Data Profiling window:
    • Right-click a data grid and select Send To | Data Profiling.
    • Right-click an object in the Object Explorer and select Data Profiling.
    • Select Tools | Profile and select a source from which to profile data. Select Query to enter a query. The Pick a Source Dialog
  2. In the Data Profiling window, view profiling information by selecting one of the categories from among the tabs.
  3. Then select a data column in the Column list to view profiling information for that column. To display data for a specific value or group, double-click a row in one of the tables in the right pane or click a block/bar in a graphic. Review the following for additional information:

    Summary Tab  

    Statistics grid

    The top pane provides statistical summaries.

    Click a column header to sort by that column (sorts statistics grid and bar graph).

    Bar graph

    For each data column, the bar graph displays the amount of unique/non-unique data as percentages.

    Hover over a bar to view values.

    Click a bar to display the selected data in the Selected tab (bottom pane).

    Legend

    The Legend provides a description for each column in the statistics grid (top pane).

    Click a column name to sort the statistics grid by that column (also sorts bar graph).

    All Data Tab

    Displays the data.
    Selected Tab

    Displays the selected data. The type of data selected and the column name appear above the grid.

    To specify the number of rows to display, click Edit Profile.

    Statistics Tab Allows you to view statistics for each data column.

    Column Pane

    Select a column name to view statistics for that column.

    Uniqueness Percentages

    A visual representation shows the amount (as percentages) of unique and non-unique data.

    Click a block to display the selected data in that group (displays in the Selected tab, bottom pane). Review the following definitions to learn what each group is comprised of:

    • Populated—Rows with real values (excludes null and missing values)
    • Distinct—This group includes one occurrence of all real values
    • Unique—Within the distinct group, the values that occur only once
    • Non-Unique—Within the distinct group, the values that occur more than once (but only one occurrence of the value is counted here)

      Non-unique + unique = distinct

    • Repeated Rows—For values that occur more than once, this group includes all subsequent occurrences of the values not included in non-unique
    • Duplicates—For values that occur more than once, this group includes all occurrences of the values
    • Missing—Rows with blank, missing, or white space values
    • Null—Rows with the value "Null"
    • All Data—The total number of rows analyzed

    Note: Clicking Non-Unique or Repeated Rows displays Duplicates.

    Value Distribution

    A bar graph displays the distribution of values. The first 20 values are shown.

    Others—Select this option to add a bar containing the remaining values to the graph.

    For numeric data, you can select to overlay the bar graph with Statistics, Quartiles, and Percentiles.

    Click a bar in the graph to display the selected data in the Selected tab.

    Note: Dates are shown using the best date/time format for date distribution.

    Value Summary

    Displays the number (and percentage) of rows in each uniqueness group.

    • Duplicates—The first and all instances of a row that is duplicated (non-unique + repeated rows)

    Double-click a row to display the selected data in the Selected tab.

    Statistics

    Provides statistical analysis of values.

    Double-click Min or Max to display the rows matching the selected value.

    Percentiles

    View how data is distributed across:

    • Grouped Frequency Distribution
    • Percentiles
    • Values Distribution

    Double-click a value in Values Distribution or Grouped Frequency Distribution to see original data.

    Frequency Tab

    View how data is distributed. Find value distribution patterns or trends within data.

    Double-click a table row to display selected data.

    Column Pane

    Select a column name to view frequency information for that data column.

    Top Values

    Lists the most-frequently occurring values.

    Bottom Values

    Lists the least-frequently occurring values.

    First Values

    Lists the first populated values in the table.

    Last Values

    Lists the last populated values in the table.
    Patterns Tab

    Identifies and lists patterns in data for string fields. Provides the count (and percentages) of values that match each pattern.

    • Double-click a pattern to filter by that pattern and display results in the Selected tab (Profiling) or data grid (Transform and Cleanse).
    • (Transform and Cleanse) Click Undo Pattern Filter to remove the filter.
    • Use the percentage (%) column to find the most frequently occurring pattern.

    Word Patterns

    Identifies and lists all word patterns in the data.

    Letter Pattern

    Identifies and lists all letter patterns in the data (collectively, for all identified word patterns).

    Identified Domain

    Toad automatically identifies a domain, and identifies and lists patterns based on that domain. Domains that Toad identifies include: Email, URL, IP Address, US Phone Number, US Zipcode, US Address, US Company Name, and US States.
    Language Tab

    Provides language analysis per character for string fields.

    Double-click a table row to display selected data.

    Character Language Distribution

    Identifies and lists languages found in the data.
    ASCII Character Distribution Identifies and lists ASCII characters found in the data.
    Duplicates Tab

    Allows you to find duplicates.

    1. Select the checkbox for each column you want to include in the search.
    2. Click Check Duplicates.

    Show/Hide Options

    Click Show Options to specify options for this search.

    String Comparison—Select a method for comparing string values.

    • Fuzzy—Uses a slightly-modified, double metaphone algorithm where each word is passed through separately. This method only works well for English language text.
  4. You can modify the profiling options and then re-profile the data.
    1. Click Edit Profile to open the Profiling Options dialog. Profiling Options
    2. Specify options to apply to this profiling session. Click Profile Now.
  5. To modify the data to profile, select Edit Profile and modify the query (Query tab). Click Profile Now.

To profile data within the Editor Window

  1. Select Tools | Edit | SQL Editor.
  2. Enter a query in the Editor and click Run SQL.
  3. Select the Profiling tab after the SQL executes. Toad displays statistical summaries similar to the Summary tab in the Data Profiling module.
  4. Click Full Profiling to send the data to the Data Profiling module.

To export a Data Profiling report

  1. After profiling data, click Report in the Wizard bar. The report displays in a preview window.
  2. Click the arrow beside and select an output format.
  3. Specify export file options and click OK.
  4. Select a file name and location in the Save As dialog.

Considerations and Limitations

Sampling is only supported for Oracle®, IBM® DB2®, SQL Server®, and MySQL databases.

Consideration/Limitation Description
Binary columns are excluded Data profiling excludes binary columns or any other data type that is not comparable.
Support for server-side sampling For Oracle®, IBM® DB2®, SQL Server®, and MySQL databases, the sampling step (number of rows sampled or random sampling) is applied to the database on the server side. This is an advantage as it can reduce processing time and network load.

Tips:

  

Related Topics

Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating