Use the People Mastering template to master persons data, such as B2C customers or contacts, using an industry-standard schema and trained machine learning model. The template also provides enrichment for addresses and phone numbers, helping to ensure that you have the most complete and up-to-date information.
As part of the mastering flow, Tamr Cloud aligns your input datasets to a unified schema with predefined output fields. Tamr Cloud uses these predefined output fields to enrich your data and consolidate similar records into entities.
In addition to the general Requirements for Input Datasets, certain data is required for People Mastering.
You must map one or more input fields to each of the predefined output fields:
This is the primary key for the dataset. See About Primary Keys for more information.
This is a unique identifier such as Social Security Number or a contact ID. The clustering model considers records with the same value for this field to be a match. Map a field to this output field only if records that have the same value for this field should be considered a match. If the values in the input do not represent a definite match, map an empty placeholder field to
region(For example: state, territory, and so on)
Except in rare edge cases, records with the same trusted ID are clustered together, while records with different trusted IDs are not clustered together. Additionally, the clustering model considers the similarity of values for the following fields, and then uses decision-tree logic to accurately identify records that refer to the same entity:
- Street address
- Phone number
- User email and email domain
Tamr Cloud looks for similarities in these fields, but does not require exact matches.
When you create an entity type using the People Mastering template, Tamr Cloud creates a mastering flow in Designer with steps specific to people mastering:
- Add Data
- Align to Person Data Model
- Prepare Data for Enrichment
- Enrich Phone
- Enrich Address
- Prepare for Clustering
- Apply Clustering Model
- Consolidate Records
- Deliver to Studio
Then, see Adding a Dataset.
This step transforms the data in the unified dataset to match the expected inputs to the enrichers included in the mastering flow.
You do not need to make changes to this step.
This step standardizes and enriches phone number data. You do not need to make changes to this step.
See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.
This step standardizes and enriches address data. You do not need to make changes to this step.
See Address Enrichment for information about this enricher, including the output fields it adds to your data.
This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with
ml_. Many of these
ml_ fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.
This read-only step groups records that refer to the same entity into a cluster, using the trained model.
Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Available Published Datasets.
This step applies rules to produce a single entity record that best represents a cluster. For most fields, these rules select the most common value from the clustered records.
Additionally, this step adds a Tamr ID (tamr_id) to each entity. The Tamr ID is a unique, persistent id.
If you added a new field in the Schema Mapping step, add a line in the transformations to tell Tamr what value to set for that field when creating the mastered entity record. See Modifying Transformations for Your Data.
This step allows you to configure how entity data appears in Studio, Curator, and published datasets. See Configuring Data Display in Studio.
Updated 9 days ago