Source Details Insight
To see Source Detail insights, select it from the dropdown in the top right corner. These insights provide more detail into how your source datasets are being used in the mastering flow and how your sources overlap in data. These insights are based on the last mastering flow run.
Any time there is a blue hover state on your insight card, you can select to view more details.
These insights are only available if you have at least 2 sources.
How do my source records overlap: This insight shows how many clusters include records from each source, and how many clusters contain records from each pair of source datasets.
In the example source overlap metrics below, there are five source datasets: Hubspot.csv
, Marketo.csv
, demo_data
, demo_dataset
, and SalesforceContacts.csv
. The insight shows how many clusters include records from each of these sources, and how many clusters contain records from each pair of sources. For example:
- The first box, containing the number 145,573, means that there are 145,573 clusters that contain records from the
Hubspot.csv
dataset. - The next box in the top row, containing the number 210, means there are 210 clusters that contain records from the
Hubspot.csv
andMarketo.csv
datasets. - The fourth box in the top row, containing the number 669, means that there are 669 clusters that contain records from the
Hubspot.csv
anddemo_dataset
datasets.
From these metrics, you can see that there is the most overlap between the SalesforceContacts.csv
and demo_data
datasets (18,914 clusters), and the least overlap between the Hubspot.csv
and Marketo.csv
datasets (210 clusters).
How many of my clusters do not contain source records from another source: This insight shows how many clusters include records from a single source, and how many clusters include records from two, three, or more sources.
In the example below:
- 910K clusters contain records from only one source.
- 27.9K clusters contain records from two sources.
- 1,500 clusters contain records from three sources.
- 161 clusters contain records from four or more sources.
Which of my sources have the least overlap: Of all the clusters from a dataset on the left column, how many of those clusters contain no records from a corresponding dataset on the top row.
In the example below, there are five source datasets: Hubspot.csv
, Marketo.csv
, demo_data
, demo_dataset
, and SalesforceContacts.csv
.
- Of the clusters that contain records from the
Hubspot.csv
dataset, there are 145,363 clusters that have no records from theMarketo.csv
dataset. - Of the clusters that contain records from the
Marketo.csv
dataset, there are 10,665 clusters that have no records from theHubspot.csv
dataset. - Of the clusters that contain records from the
demo_data
dataset, there are 269,102 clusters that have no records from theSalesforceContacts.csv
dataset. - Of the clusters that contain records from the
demo_dataset dataset
dataset, there are 1,688 clusters that have no records from theSalesforceContacts.csv
dataset. - Of the clusters that contain records from the
SalesforceContacts.csv
dataset, there are 523,173 clusters that have no records from theMarketo.csv
dataset.
From these metrics, you can tell that the least amount of overlap is between the SalesforceContacts.csv
dataset and the Marketo.csv
dataset (523,273 clusters). The SalesforceContacts.csv
dataset has less overlap with the other sources.
Updated 2 months ago