Source Details Insight
Source Detail insights provide more detail into how your source datasets are being used in the mastering flow and how your sources overlap in data. These insights are based on the last mastering flow run.
Any time there is a blue hover state on your insight card, you can select that metric to view the individual records that were used to compute the insight metric.
These insights are only available if you have at least 2 sources.
How do my source records overlap: This insight shows how many clusters include records from each source, and how many clusters contain records from each pair of source datasets.
In the example source overlap metrics below, there are five source datasets: Hubspot.csv, Marketo.csv, demo_data, demo_dataset, and SalesforceContacts.csv. The insight shows how many clusters include records from each of these sources, and how many clusters contain records from each pair of sources. For example:
- The first box, containing the number 145,573, means that there are 145,573 clusters that contain records from the
Hubspot.csvdataset. - The next box in the top row, containing the number 210, means there are 210 clusters that contain records from the
Hubspot.csvandMarketo.csvdatasets. - The fourth box in the top row, containing the number 669, means that there are 669 clusters that contain records from the
Hubspot.csvanddemo_datasetdatasets.
From these metrics, you can see that there is the most overlap between the SalesforceContacts.csv and demo_data datasets (18,914 clusters), and the least overlap between the Hubspot.csv and Marketo.csv datasets (210 clusters).

Sample source insights
How many of my clusters do not contain source records from another source: This insight shows how many clusters include records from a single source, and how many clusters include records from two, three, or more sources.
In the example below:
- 910K clusters contain records from only one source.
- 27.9K clusters contain records from two sources.
- 1,500 clusters contain records from three sources.
- 161 clusters contain records from four or more sources.

Sample insights
Which of my sources have the least overlap: Of all the clusters from a dataset on the left column, how many of those clusters contain no records from a corresponding dataset on the top row.
In the example below, there are five source datasets: Hubspot.csv, Marketo.csv, demo_data, demo_dataset, and SalesforceContacts.csv.
- Of the clusters that contain records from the
Hubspot.csvdataset, there are 145,363 clusters that have no records from theMarketo.csvdataset. - Of the clusters that contain records from the
Marketo.csvdataset, there are 10,665 clusters that have no records from theHubspot.csvdataset. - Of the clusters that contain records from the
demo_datadataset, there are 269,102 clusters that have no records from theSalesforceContacts.csvdataset. - Of the clusters that contain records from the
demo_dataset datasetdataset, there are 1,688 clusters that have no records from theSalesforceContacts.csvdataset. - Of the clusters that contain records from the
SalesforceContacts.csvdataset, there are 523,173 clusters that have no records from theMarketo.csvdataset.
From these metrics, you can tell that the least amount of overlap is between the SalesforceContacts.csv dataset and the Marketo.csv dataset (523,273 clusters). The SalesforceContacts.csv dataset has less overlap with the other sources.

Sample insights
Updated 8 months ago