Contacts Mastering

Use the Contacts Mastering template to master marketing data, such as B2C contacts, using an industry-standard schema and trained machine learning model. The template also provides enrichment for phone numbers and for contact and organization addresses, helping to ensure that you have the most complete and up-to-date information.

Input Dataset Requirements

As part of the mastering flow, Tamr Cloud aligns your input datasets to a unified schema with predefined output fields. Tamr Cloud uses these predefined output fields to enrich your data and consolidate similar records into entities.

In addition to the general Requirements for Input Datasets, certain data is required for Contacts mastering.

You must map one or more input fields to each of the predefined output fields:

  • contact_address_line_1
  • contact_address_line_2
  • contact_city
  • contact_country
  • contact_postal_code
  • contact_region
  • email
  • fax_number
  • first_name
  • last_name
  • middle_name
  • name_prefix
  • name_suffix
  • org_address_line_1
  • org_address_line_2
  • org_city
  • org_country
  • org_name
  • org_name_alt
  • org_postal_code
  • org_region
  • phone_number
  • phone_number_alt
  • professional_title
  • trusted_id
    This is a unique identifier such as a customer or contact ID. The clustering model considers records with the same value for this field to be a match. Map a field to this output field only if records that have the same value for this field should be considered a match. If the values in the input do not represent a definite match, map an empty placeholder field to trusted_id, then add the following transformation to create tamr_record_id step: select *, null as trusted_id.
  • unique_key
    This is the primary key for the dataset. See About Primary Keys for more information.

Clustering Model

For Contacts mastering, the clustering model considers the similarity of values for the following fields, and then uses decision-tree logic to accurately identify records that refer to the same entity:

  • Name
  • Street address
  • Phone number
  • User email

Except in rare edge cases, records with the same trusted ID are clustered together, while records with different trusted IDs are not clustered together.

Tamr Cloud looks for similarities in these fields, not exact matches. For example, two addresses on the same street can correspond to the same person.

Modifying the Mastering Flow

Designer Flow Step Overview

When you create an entity type using the Contacts mastering template, Tamr Cloud creates a mastering flow in Designer with steps specific to Contacts mastering:

Add Data

Ensure your input data meets both the general requirements for input datasets and the specific requirements for this template.

Then, see Adding a Data Source.

Align to Contacts Data Model

See Mapping Input Fields to a Unified Schema.

Create tamr_record_id

This transformation step ensures that each source record has a unique primary key across all source datasets, by concatenating the source dataset name and the unique key, separated by an underscore, from the source record into a new primary key field: tamr_record_id.

Important: If records within the same source dataset have duplicate primary key values, the tamr_record_id value for those records will also be duplicates.

You do not need to modify this step.

Prepare Data for Primary Phone Enrichment

This step transforms the data in the unified dataset to match the expected inputs for the phone number enrichment service.

You do not need to make changes to this step.

Enrich Primary Phone

This step validates, standardizes, and enriches phone number data for each contact’s primary phone number, stored in the phone_number field in the unified dataset. You do not need to make changes to this step.

See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.

Prepare Data for Alt Phone Enrichment

This step prepares the unified dataset ahead of enriching phone number data for each contact’s alternate phone number, stored in the phone_number_alt field in the unified dataset.

This step renames the phone number enrichment fields returned for the primary phone number to prepend each of these fields with primary_. For example, enriched_phone_carrier becomes primary_enriched_phone_carrier. These changes allow you to easily identify enrichment fields related to the primary phone number.

You do not need to make changes to this step.

Enrich Alt Phone

This step validates, standardizes, and enriches phone number data for each contact’s alternate phone number, stored in the phone_number_alt field in the unified dataset.

You do not need to make changes to this step.

See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.

Prepare Data for Org Address Enrichment

This step transforms the data in the unified dataset to match the expected inputs for the address enrichment service.

Additionally, the step renames the phone number enrichment fields returned for the alternate phone number to prepend each of these fields with alt_. For example, enriched_phone_carrier becomes alt_enriched_phone_carrier. These changes allow you to easily identify enrichment fields related to the alternate phone number.

By default, this enriched alternate phone number data is used in the clustering model, but is not included in the final mastering flow output. You can modify the Consolidate Records step and the Deliver to Studio step to include additional fields.

Enrich Org Address

This step standardizes and enriches address data for the contact’s organization, stored in the org_address_line_1, org_address_line_2, org_city, org_region, org_postal_code, and org_country fields in the unified dataset. You do not need to make changes to this step.

See Address Enrichment for information about this enricher, including the output fields it adds to your data.

Prepare Data for Contact Address Enrichment

This step transforms the data in the unified dataset to match the expected inputs to address enrichment service.

Additionally, the step renames the address enrichment fields returned for the contact’s organization address to prepend each of these fields with org_. For example, enriched_address_city becomes org_enriched_address_city. These changes allow you to easily identify enrichment fields related to the organization address.

Enrich Contact Address

This step standardizes and enriches the contact’s address data, stored in the contact_address_line_1, contact_address_line_2, contact_city, contact_region, contact_postal_code, and contact_country fields in the unified dataset. You do not need to make changes to this step.

See Address Enrichment for information about this enricher, including the output fields it adds to your data.

Prepare for Clustering

This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with ml_. Many of these ml_ fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.

Additionally, the step renames the address enrichment fields returned for the contact’s address to prepend each of these fields with contact_. For example, enriched_address_city becomes contact_enriched_address_city. These changes allow you to easily identify enrichment fields related to the contact’s address.

Apply Clustering Model

This read-only step groups records that refer to the same entity into a cluster, using the trained model.

Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Available Published Datasets.

Consolidate Records

This step applies rules to produce a single entity record that best represents a cluster. For most fields, these rules select the most common value from the clustered records.

Additionally, this step adds a Tamr ID (tamr_id) to each entity. The Tamr ID is a unique, persistent id.

You do not need to modify this step unless:

  • You added output fields in the Align to Customer Data Model step and want those fields to be included in the final output for the entity type.
  • You want to add fields returned by the phone enrichment service for alternate phone numbers.

See Modifying Transformations for Your Data for instructions on adding fields in this step.

Deliver to Studio

This step allows you to configure how entity data appears in Studio, Curator, and published datasets. See Configuring Data Display in Studio.

The Deliver to Studio step is configured by group as follows by default:

Contact:

Unified FieldMapped Display Name
tamr_persistent_idTamr ID
full_nameFull Name
professional_titleTitle
company_nameCompany
most_common_origin_emailEmail
phone_numberPhone
contact_address_line_1Address Line 1
contact_cityCity
contact_regionRegion
contact_postal_codePostal Code
contact_countryCountry
entityIDEntity ID

Organization:

Unified FieldMapped Display Name
org_nameOrganization Name
org_address_line_1Organization Address
org_cityOrganization City
org_regionOrganization Region
org_postal_codeOrganization Postal Code
org_countryOrganization Country

Enriched Phone Information:

Unified FieldMapped Display Name
primary_enriched_phone_national_formatEnriched Phone
primary_enriched_phone_typeEnriched Phone Type

Enriched Contact Address Information:

Unified FieldMapped Display Name
contact_enriched_full_addressEnriched Contact Address
contact_enriched_address_cityEnriched Contact City
contact_enriched_address_regionEnriched Contact Region

Enriched Organization Information:

Unified FieldMapped Display Name
org_enriched_full_addressEnriched Organization Address
org_enriched_address_cityEnriched Organization City
org_enriched_address_regionEnriched Organization Region
org_enriched_address_postal_code_primaryEnriched Organization Postal Code
org_enriched_address_country_nameEnriched Organization Country