People Mastering

You use the People Mastering template to master data about people, such as B2C customers or contacts.

The People Mastering template uses an industry-standard schema and trained machine learning model. The template also provides enrichment for addresses and phone numbers, helping to ensure that you have the most complete and up-to-date information.

Source Dataset Requirements

As part of the mastering flow, Tamr Cloud aligns your input datasets to a unified schema with predefined output fields. Tamr Cloud uses these predefined output fields to enrich your data and consolidate similar records into entities.

In addition to the general Requirements for Input Datasets, certain data is required for People Mastering.

You must map one or more input fields to each of the predefined output fields:

  • uniqueKey
    This is the primary key used in the source dataset to uniquely identify each record. See About Primary Keys for more information.
  • trusted_id
    This is a non-unique key, such as a customer or contact identification number used by your internal systems. The clustering model always clusters together records that have the same trusted id. If the values in this field do not represent a definite match, map an empty placeholder field to trusted_id, and then add the following transformation in the Create tamr_record_id step: SELECT *, '' as trusted_id;.
  • address_line_1
  • address_line_2
  • city
  • country
  • email
  • first_name
  • last_name
  • middle_name
  • name_suffix
  • phone
  • postal_code
  • region (For example: state, territory, and so on)

Clustering Model

Records with the same trusted IDs are clustered together; records with different trusted IDs are not clustered together. Records with null/empty trusted_id are clustered based on similarity, meaning that they may be clustered with records that have a trusted ID. Additionally, the clustering model considers the similarity of values for the following fields, and then uses decision-tree logic to accurately identify records that refer to the same entity:

  • Name
  • Street address
  • Phone number
  • User email and email domain

Tamr Cloud looks for similarities in these fields, but does not require exact matches.

Modifying the Mastering Flow

Designer Flow Step Overview

When you create a data product using the People Mastering template, Tamr Cloud creates a mastering flow in Designer with steps specific to people mastering:

Add Data

Ensure your input data meets both the general requirements for input datasets and the specific requirements for this template.

Then, see Adding a Dataset.

Align to Person Data Model

See Mapping Input Fields to a Unified Schema.

Create tamr_record_id

This transformation step ensures that each source record has a unique primary key across all source datasets by adding a new primary key field: tamr_record_id.

For data products created before June 1, 2023, this step produces a tamr_record_id for each source record by concatenating the source dataset name and the source primary key, separated by an underscore.

For data products created on June 1, 2023 or later, this step produces a tamr_record_id for each source record by creating a 128-bit hash value of the source dataset name and the source primary key. See the Tamr Core documentation for a description of the function used to generate the hash value.

Important: If records within the same source dataset have duplicate primary key values, the tamr_record_id value for those records will also be duplicates.

You do not need to modify this step.

Prepare Data for Enrichment

This step transforms the data in the unified dataset to match the expected inputs to the enrichers included in the mastering flow.

You do not need to make changes to this step.

Enrich Phone

This step standardizes and enriches phone number data. You do not need to make changes to this step.

See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.

Enrich Address

This step standardizes and validates address information, and enriches addresses with latitude, longitude, and detailed address information. You do not need to make changes to this step.

See Address Standardization, Validation, and Geocoding for information about this enricher, including the output fields it adds to your data.

Prepare for Clustering

This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with ml_. Many of these ml_ fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.

Apply Clustering Model

This read-only step groups records that refer to the same entity into a cluster, using the trained model.

Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Datasets Available for Export.

Consolidate Records

This step applies rules to produce a single record, called the mastered entity record, that best represents a cluster. For most fields, these rules select the most common value from the clustered records.

Additionally, this step adds a Tamr ID (tamr_id) to each mastered entity record. The Tamr ID is a unique, persistent id.

If you added a new field in the Schema Mapping step, add a line in the transformations to tell Tamr Cloud what value to set for that field when creating the mastered entity record. See Modifying Record Consolidation Transformations.

Deliver Data to Studio

This step allows you to configure how data appears in Studio, Curator, and exported datasets. See Configuring Data Display in Studio.