Source Details Insight

To see Source Detail insights, select it from the dropdown in the top right corner. These insights provide more detail into how your source datasets are being used in the mastering flow and how your sources overlap in data. These insights are based on the last mastering flow run.

For legacy data products, any time there is a blue hover state on your insight card, you can select to view more details.

These insights are only available if you have at least 2 sources.

How do my source records overlap: This insight shows how many clusters include records from each source, and how many clusters contain records from each pair of source datasets.

In the example source overlap metrics below, there are five source datasets: Hubspot.csv, Marketo.csv, demo_data, demo_dataset, and SalesforceContacts.csv. The insight shows how many clusters include records from each of these sources, and how many clusters contain records from each pair of sources. For example:

  • The first box, containing the number 145,573, means that there are 145,573 clusters that contain records from the Hubspot.csv dataset.
  • The next box in the top row, containing the number 210, means there are 210 clusters that contain records from the Hubspot.csv and Marketo.csv datasets.
  • The fourth box in the top row, containing the number 669, means that there are 669 clusters that contain records from the Hubspot.csv and demo_dataset datasets.

From these metrics, you can see that there is the most overlap between the SalesforceContacts.csv and demo_data datasets (18,914 clusters), and the least overlap between the Hubspot.csv and Marketo.csv datasets (210 clusters).

Sample source insights

Sample source insights

How many of my clusters do not contain source records from another source: This insight shows how many clusters include records from a single source, and how many clusters include records from two, three, or more sources.

In the example below:

  • 910K clusters contain records from only one source.
  • 27.9K clusters contain records from two sources.
  • 1,500 clusters contain records from three sources.
  • 161 clusters contain records from four or more sources.
This insight shows that 910K clusters contain records from one source, 27.9K clusters contain records from two sources, 1,500 clusters contain records from three sources, and 161 clusters contain records from four or more sources.

Sample insights

Which of my sources have the least overlap: Of all the clusters from a dataset on the left column, how many of those clusters contain no records from a corresponding dataset on the top row.

In the example below, there are five source datasets: Hubspot.csv, Marketo.csv, demo_data, demo_dataset, and SalesforceContacts.csv.

  • Of the clusters that contain records from the Hubspot.csv dataset, there are 145,363 clusters that have no records from the Marketo.csv dataset.
  • Of the clusters that contain records from the Marketo.csv dataset, there are 10,665 clusters that have no records from the Hubspot.csv dataset.
  • Of the clusters that contain records from the demo_data dataset, there are 269,102 clusters that have no records from the SalesforceContacts.csv dataset.
  • Of the clusters that contain records from the demo_dataset datasetdataset, there are 1,688 clusters that have no records from the SalesforceContacts.csv dataset.
  • Of the clusters that contain records from the SalesforceContacts.csv dataset, there are 523,173 clusters that have no records from the Marketo.csv dataset.

From these metrics, you can tell that the least amount of overlap is between the SalesforceContacts.csv dataset and the Marketo.csv dataset (523,273 clusters). The SalesforceContacts.csv dataset has less overlap with the other sources.

Sample insights

Sample insights