Gaining Insights with Data Product Metrics
You can view metrics and data from your last mastering flow run.
You can view metrics and data from your last mastering flow run. For any data product, go to the Insights page in Curator, to the right of the Source Records page.
Results Summary
For any data product, you can find a summary report on the Insights page. These insights are based on the last mastering flow run.

Sample Insights
The following insights are available:
- Source Records:
- The number of source records in your data product.
- Changes in source records: The number of records added, moved (updated cluster membership), and removed from your data product in the last flow run. The blue arrow and number also represent the changes in source records.
- Mastered Entities:
- The number of mastered entities in your data product.
- Changes in mastered entities: The number of mastered entities added, updated, and removed from your data product in the last flow run. The blue arrow and number also represent the changes in mastered entities.
- Deduplication: The percentage of source records which were duplicates. The blue arrow and number represent the change in deduplication rate.
- Records by Source: A list of all sources in your data product and the number of records in each source. The blue arrow and number represent the change in the number of records.
- Completion of Entity Attributes: For attributes in your data product, the percentage of completion and number of empty values. The blue arrow and number represent the change in these metrics.
Source Details
To see Source Detail insights, select it from the dropdown in the top right corner. These insights provide more detail into how your source datasets are being used in the mastering flow and how your sources overlap in data. These insights are based on the last mastering flow run.
How do my source records overlap: This insight shows how many clusters include records from each source, and how many clusters contain records from each pair of source datasets.
In the example source overlap metrics below, there are five source datasets: Hubspot.csv
, Marketo.csv
, demo_data
, demo_dataset
, and SalesforceContacts.csv
. The insight shows how many clusters include records from each of these sources, and how many clusters contain records from each pair of sources. For example:
- The first box, containing the number 145,573, means that there are 145,573 clusters that contain records from the
Hubspot.csv
dataset. - The next box in the top row, containing the number 210, means there are 210 clusters that contain records from the
Hubspot.csv
andMarketo.csv
datasets. - The third box in the top row, containing the number 669, means that there are 669 clusters that contain records from the
Hubspot.csv
anddemo_data datasets
.
From these metrics, you can see that there is the most overlap between the SalesforceContacts.csv
and demo_data
datasets (18,914 clusters), and the least overlap between the Hubspot.csv
and Marketo.csv
datasets (210 clusters).

Sample source insights
How many of my clusters do not contain source records from another source: This insight shows how many clusters include records from a single source, and how many clusters include records from two, three, or more sources.
In the example below:
- 910K clusters contain records from only one source.
- 27.9K clusters contain records from two sources.
- 1,500 clusters contain records from three sources.
- 161 clusters contain records from four or more sources.

Sample insights
Which of my sources have the least overlap: Of all the clusters from a dataset on the left column, how many of those clusters contain no records from a corresponding dataset on the top row.
In the example below, there are five source datasets: Hubspot.csv
, Marketo.csv
, demo_data
, demo_dataset
, and SalesforceContacts.csv
.
- Of the clusters that contain records from the
Hubspot.csv
dataset, there are 145,363 clusters that have no records from theMarketo.csv
dataset. - Of the clusters that contain records from the
Marketo.csv
dataset, there are 10,665 clusters that have no records from theHubspot.csv
dataset. - Of the clusters that contain records from the
demo_data
dataset, there are 269,102 clusters that have no records from theSalesforceContacts.csv
dataset. - Of the clusters that contain records from the
demo_dataset dataset
dataset, there are 1,688 clusters that have no records from theSalesforceContacts.csv
dataset. - Of the clusters that contain records from the
SalesforceContacts.csv
dataset, there are 523,173 clusters that have no records from theMarketo.csv
dataset.
From these metrics, you can tell that the least amount of overlap is between the SalesforceContacts.csv
dataset and the Marketo.csv
dataset (523,273 clusters). The SalesforceContacts.csv
dataset has less overlap with the other sources.

Sample insights
Updated 5 days ago