You use the People Mastering template to master data about people, such as B2C customers or contacts.
The People Mastering template uses an industry-standard schema and trained machine learning model. The template also provides enrichment for addresses and phone numbers, helping to ensure that you have the most complete and up-to-date information.
Source Dataset Requirements
As part of the mastering flow, Tamr Cloud aligns your input datasets to a unified schema with predefined output fields. Tamr Cloud uses these predefined output fields to enrich your data and consolidate similar records into entities.
In addition to the general Requirements for Input Datasets, certain data is required for People Mastering.
You must map one or more input fields to each of the predefined output fields:
This is the primary key used in the source dataset to uniquely identify each record. See About Primary Keys for more information.
This is a non-unique key, such as a customer or contact identification number used by your internal systems. The clustering model always clusters together records that have the same trusted id. If the values in this field do not represent a definite match, map an empty placeholder field to trusted_id, and then add the following transformation in the Create tamr_record_id step:
SELECT *, '' as trusted_id;.
region(For example: state, territory, and so on)
Records with the same trusted IDs are clustered together; records with different trusted IDs are not clustered together. Records with null/empty
trusted_id are clustered based on similarity, meaning that they may be clustered with records that have a trusted ID. Additionally, the clustering model considers the similarity of values for the following fields, and then uses decision-tree logic to accurately identify records that refer to the same entity:
- Street address
- Phone number
- User email and email domain
Tamr Cloud looks for similarities in these fields, but does not require exact matches.
Modifying the Mastering Flow
Designer Flow Step Overview
When you create a data product using the People Mastering template, Tamr Cloud creates a mastering flow in Designer with steps specific to people mastering:
- Add Data
- Align to Person Data Model
- Create tamr_record_id
- Prepare Data for Enrichment
- Enrich Phone
- Enrich Address
- Prepare for Clustering
- Apply Clustering Model
- Consolidate Records
- Deliver Data to Studio
Ensure your input data meets both the general requirements for input datasets and the specific requirements for this template.
Then, see Adding a Dataset.
Align to Person Data Model
See Mapping Input Fields to a Unified Schema.
This transformation step ensures that each source record has a unique primary key across all source datasets by adding a new primary key field:
For data products created before June 1, 2023, this step produces a
tamr_record_id for each source record by concatenating the source dataset name and the source primary key, separated by an underscore.
For data products created on June 1, 2023 or later, this step produces a
tamr_record_id for each source record by creating a 128-bit hash value of the source dataset name and the source primary key. See the Tamr Core documentation for a description of the function used to generate the hash value.
Important: If records within the same source dataset have duplicate primary key values, the
tamr_record_id value for those records will also be duplicates.
You do not need to modify this step.
Prepare Data for Enrichment
This step transforms the data in the unified dataset to match the expected inputs to the enrichers included in the mastering flow.
You do not need to make changes to this step.
This step standardizes and enriches phone number data. You do not need to make changes to this step.
See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.
This step standardizes and validates address information, and enriches addresses with latitude, longitude, and detailed address information. You do not need to make changes to this step.
See Address Standardization, Validation, and Geocoding for information about this enricher, including the output fields it adds to your data.
Prepare for Clustering
This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with
ml_. Many of these
ml_ fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.
Apply Clustering Model
This read-only step groups records that refer to the same entity into a cluster, using the trained model.
Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Datasets Available for Export.
This step applies rules to produce a single record, called the mastered entity record, that best represents a cluster. For most fields, these rules select the most common value from the clustered records.
Additionally, this step adds a Tamr ID (tamr_id) to each mastered entity record. The Tamr ID is a unique, persistent id.
If you added a new field in the Schema Mapping step, add a line in the transformations to tell Tamr Cloud what value to set for that field when creating the mastered entity record. See Modifying Record Consolidation Transformations.
Deliver Data to Studio
This step allows you to configure how data appears in Studio, Curator, and exported datasets. See Configuring Data Display in Studio.
Updated 1 day ago