Contact Mastering
You use the Contact Mastering template to master marketing data commonly stored in customer relationship management (CRM) software, such as B2C and B2B contacts.
The Contact Mastering uses an industry-standard schema and trained machine learning model. The template also provides enrichment for phone numbers and for contact and organization addresses, helping to ensure that you have the most complete and up-to-date information.
Source Dataset Requirements
As part of the mastering flow, Tamr Cloud aligns your source datasets to a unified schema with predefined output attributes. Tamr Cloud uses these predefined output attributes to enrich your data and consolidate similar records into entities.
In addition to the general Requirements for Input Datasets, certain data is required for Contact Mastering.
You must map one or more source fields to each of the predefined output attributes:
-
contact_address_line_1
-
contact_address_line_2
-
contact_city
-
contact_country
-
contact_postal_code
-
contact_region
-
email
-
fax_number
-
first_name
-
last_name
-
middle_name
-
name_prefix
-
name_suffix
-
org_address_line_1
-
org_address_line_2
-
org_city
-
org_country
-
org_name
-
org_name_alt
-
org_postal_code
-
org_region
-
phone_number
Note: Map the contact's phone number, not the organization's phone number, to this attribute. -
phone_number_alt
-
professional_title
-
unique_key
This is the primary key used in the source dataset to uniquely identify each record. See About Primary Keys for more information. -
trusted_id
This is a non-unique key, such as a customer or contact identification number used by your internal systems. The clustering model always clusters together records that have the same trusted id. If the values in this field do not represent a definite match, map an empty placeholder field to trusted_id, and then add the following transformation in the Create tamr_record_id step:SELECT *, '' as trusted_id;
.
Clustering Model
Records with the same trusted IDs are clustered together; records with different trusted IDs are not clustered together. Records with null/empty trusted IDs are clustered based on similarity, meaning that they may be clustered with records that have a trusted ID. Additionally, the clustering model considers the similarity of values for the following fields, and then uses decision-tree logic to accurately identify records that refer to the same entity:
- Name
- Street address
- Phone number
- User email
Tamr Cloud looks for similarities in these fields, but does not require exact matches.
Modifying the Mastering Flow
Designer Flow Step Overview
When you create a data product using the Contact Mastering template, Tamr Cloud creates a mastering flow in Designer with steps specific to Contact Mastering:
- Add Data
- Align to Contacts Data Model
- Create tamr_record_id
- Prepare Data for Primary Phone Enrichment
- Enrich Primary Phone
- Prepare Data for Alt Phone Enrichment
- Enrich Alt Phone
- Prepare Data for Org Address
- Enrich Org Address
- Prepare Data for Contact Address Enrichment
- Enrich Contact Address
- Prepare for Clustering
- Apply Clustering Model
- Consolidate Records
- Deliver Data to Studio
Add Data
Ensure your input data meets both the general requirements for input datasets and the specific requirements for this template.
Then, see Adding a Data Source.
Align to Contacts Data Model
See Mapping Input Fields to a Unified Schema.
Create tamr_record_id
This transformation step ensures that each source record has a unique primary key across all source datasets by adding a new primary key field: tamr_record_id
.
For data products created before June 1, 2023, this step produces a tamr_record_id
for each source record by concatenating the source dataset name and the source primary key, separated by an underscore.
For data products created on June 1, 2023 or later, this step produces a tamr_record_id
for each source record by creating a 128-bit hash value of the source dataset name and the source primary key. See the Tamr Core documentation for a description of the function used to generate the hash value.
Important: If records within the same source dataset have duplicate primary key values, the tamr_record_id
value for those records will also be duplicates.
You do not need to modify this step.
Prepare Data for Primary Phone Enrichment
This step transforms the data in the unified dataset to match the expected inputs for the phone number enrichment service.
You do not need to make changes to this step.
Enrich Primary Phone
This step validates, standardizes, and enriches phone number data for each contact’s primary phone number, stored in the phone_number field in the unified dataset. You do not need to make changes to this step.
See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.
Prepare Data for Alt Phone Enrichment
This step prepares the unified dataset ahead of enriching phone number data for each contact’s alternate phone number, stored in the phone_number_alt field in the unified dataset.
This step renames the phone number enrichment fields returned for the primary phone number to prepend each of these fields with primary_
. For example, enriched_phone_carrier
becomes primary_enriched_phone_carrier
. These changes allow you to easily identify enrichment fields related to the primary phone number.
You do not need to make changes to this step.
Enrich Alt Phone
This step validates, standardizes, and enriches phone number data for each contact’s alternate phone number, stored in the phone_number_alt
field in the unified dataset.
You do not need to make changes to this step.
See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.
Prepare Data for Org Address
This step transforms the data in the unified dataset to match the expected inputs for the address enrichment service.
Additionally, the step renames the phone number enrichment fields returned for the alternate phone number to prepend each of these fields with alt_
. For example, enriched_phone_carrier
becomes alt_enriched_phone_carrier
. These changes allow you to easily identify enrichment fields related to the alternate phone number.
By default, this enriched alternate phone number data is used in the clustering model, but is not included in the final mastering flow output. You can modify the Consolidate Records step and the Deliver to Studio step to include additional fields.
Enrich Org Address
This step standardizes and validates address information for the contact’s organization, stored in the org_address_line_1
, org_address_line_2
, org_city
, org_region
, org_postal_code
, and org_country
fields in the unified dataset. Additionally, this step enriches addresses with latitude, longitude, and detailed address information. You do not need to make changes to this step.
See Address Standardization, Validation, and Geocoding for information about this enricher, including the output fields it adds to your data.
Prepare Data for Contact Address Enrichment
This step transforms the data in the unified dataset to match the expected inputs to address enrichment service.
Additionally, the step renames the address enrichment fields returned for the contact’s organization address to prepend each of these fields with org_
. For example, enriched_address_city
becomes org_enriched_address_city
. These changes allow you to easily identify enrichment fields related to the organization address.
Enrich Contact Address
This step standardizes and validates address information for the contact’s organization, stored in the contact_address_line_1
, contact_address_line_2
, contact_city
, contact_region
, contact_postal_code
, and contact_country
fields in the unified dataset. Additionally, this step enriches addresses with latitude, longitude, and detailed address information. You do not need to make changes to this step.
See Address Standardization, Validation, and Geocoding for information about this enricher, including the output fields it adds to your data.
Prepare for Clustering
This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with ml_
. Many of these ml_
fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.
Additionally, the step renames the address enrichment fields returned for the contact’s address to prepend each of these fields with contact_
. For example, enriched_address_city
becomes contact_enriched_address_city
. These changes allow you to easily identify enrichment fields related to the contact’s address.
Apply Clustering Model
This read-only step groups records that refer to the same entity into a cluster, using the trained model.
Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Available Published Datasets.
Consolidate Records
This step applies rules to produce a single record, called the mastered entity record, that best represents a cluster. For most fields, these rules select the most common value from the clustered records.
This step also creates the All Roles
mastered entity field, which provides professional title and related organization name values from clustered source records. For example:
All Roles
: SVP Data - Goldman Sachs; SVP Data & Analytics, JP Morgan; VP Engineering - BofA
Additionally, this step adds a Tamr ID (tamr_id) to each entity. The Tamr ID is a unique, persistent id.
You do not need to modify this step unless:
- You added output fields in the Align to Customer Data Model step and want those fields to be included in the final output for the data product.
- You want to add fields returned by the phone enrichment service for alternate phone numbers.
See Modifying Record Consolidation Transformations.
Deliver Data to Studio
This step allows you to configure how data appears in Studio, Curator, and published datasets. See Configuring Data Display in Studio.
The Deliver to Studio step is configured by group as follows by default:
Contact:
Unified Field | Mapped Display Name |
---|---|
tamr_persistent_id | Tamr ID |
full_name | Full Name |
professional_title | Title |
company_name | Company |
most_common_origin_email | |
phone_number | Phone |
contact_address_line_1 | Address Line 1 |
contact_city | City |
contact_region | Region |
contact_postal_code | Postal Code |
contact_country | Country |
role_history | All Roles |
entityID | Entity ID |
Organization:
Unified Field | Mapped Display Name |
---|---|
org_name | Organization Name |
org_address_line_1 | Organization Address |
org_city | Organization City |
org_region | Organization Region |
org_postal_code | Organization Postal Code |
org_country | Organization Country |
Enriched Phone Information:
Unified Field | Mapped Display Name |
---|---|
primary_enriched_phone_national_format | Enriched Phone |
primary_enriched_phone_type | Enriched Phone Type |
Enriched Contact Address Information:
Unified Field | Mapped Display Name |
---|---|
contact_enriched_full_address | Enriched Contact Address |
contact_enriched_address_city | Enriched Contact City |
contact_enriched_address_region | Enriched Contact Region |
Enriched Organization Information:
Unified Field | Mapped Display Name |
---|---|
org_enriched_full_address | Enriched Organization Address |
org_enriched_address_city | Enriched Organization City |
org_enriched_address_region | Enriched Organization Region |
org_enriched_address_postal_code_primary | Enriched Organization Postal Code |
org_enriched_address_country_name | Enriched Organization Country |
Updated 1 day ago