Steps Completed by B2B Customers

When you create a data product using the B2B customers template, Tamr Cloud creates a mastering flow with steps specific to mastering and enriching business customer data.

The following table describes each step in the B2B customers flow, and explains which steps usually need to be edited for your data.

If you need to make changes to the mastering flow beyond those described in this documentation, contact Support at [email protected] for assistance.

Usually Requires Changes Step Description
âś…
Add Data You verify that source data meets both general requirements and template-specific requirements. Then, you add source data.
âś…
Align to Customer Data Model You map input columns to attributes in the supplied schema. If you add attributes to the unified schema, you will need to update the following steps, as described in the rows below, to ensure that these attributes appear in your final mastered entities:
  • Consolidate Records
  • Configure Attributes
❌
Create tamr_record_id This transformation step ensures that each source record has a unique primary key across all source datasets by adding a new primary key field: tamr_record_id. The tamr_record_id is a 128-bit hash value of the source dataset name and the source primary key. See the Tamr Core documentation for a description of the function used to generate the hash value.

Important: If records within the same source dataset have duplicate primary key values, the tamr_record_id value for those records will also be duplicates.

If you mapped an empty placeholder column to the trusted_id attribute, add the following transformation: SELECT *, '' as trusted_id;.
❌
Prepare Data for Address Enrichment This step transforms the data in the unified dataset to match the expected inputs for the address enrichment service.
❌
Standardize URL This step provides a cleaned version of the website domain in url attribute values. The cleaned domain is available in the Website Domain field in the mastered entity. See URL Standardization and Cleaning.
❌
Enrich Address This step standardizes and validates address information, and enriches addresses with latitude, longitude, and detailed address information. See Address Standardization, Validation, and Geocoding.
❌
Prepare Address Fields This step transforms the data in address fields in the unified dataset to prepare for clustering.
❌
Prepare Data for Phone Enrichment This step transforms the data in the unified dataset to match the expected inputs to the phone number data quality service included in the mastering flow. This step replaces empty country values in source records with the country returned by the Enrich Address step.
❌
Enrich Phone This step validates, standardizes, and enriches phone number data. See Phone Number Enrichment.
❌
Prepare Fields for Tamr Enrich ID Step This step prepares fields to be used by the Tamr Enrich ID step.
❌
Enrich Company Name This step cleans and enriches company name data. See Company Name Enrichment.
❌
Tamr Enrich ID This step matches each company in your source datasets to a Tamr Enrich ID. If a matching Tamr Enrich ID is identified, the ID is used by the clustering model to identify duplicate source records. The ID is also used by the Tamr Enrich step to add the referential data from your selected data provider to your mastered entities, if available.

The step looks for a matching Tamr Enrich ID based on the following attributes, and prioritizes matches available in the public sources:
  1. First, by company name and full address.
  2. Next, by company name, city, country, and either region or postal code.
  3. Next, by company name and country.
  4. Next, by the cleaned phone number.
  5. Finally, by the cleaned website.
See Tamr Enrich ID for information about this enricher, including the output fields it adds to your data.
❌
Prepare for Clustering This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records.

The fields created as input to the model are prefixed with ml_. Many of these ml_ fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.
❌
Apply Clustering Model This step groups records that refer to the same entity into a cluster, using the trained model. See Features of B2B Customers.
âś…
Consolidate Records This step applies rules to produce a single record, called the mastered entity record, that best represents a cluster. For most fields, these rules select the most common value from the clustered records.

Additionally, this step adds a Tamr ID (tamr_id) to each mastered entity record. The Tamr ID is a unique, persistent id.

If you added new attributes in the Schema Mapping step, add lines in the transformations to tell Tamr Cloud what value to set for each attribute when creating the mastered entity. See Modifying Record Consolidation Transformations.
âś…
Tamr Enrich This step enriches each mastered company entity that has a Tamr Enrich ID with referential data data from a public source, if available. By default, Tamr enriches each company with the best match from the these public sources.

For example, if Tamr identifies a match in both the public sources and in BoldData, this step enriches your data with the match from the public sources, even if the BoldData match is a higher confidence match.

See Public Source Enrichment for the attributes added by the public sources.

Note: If you would like to use a data provider other than the Public Sources service, contact Tamr at [email protected] to discuss data provider entitlement.
❌
Calculate Similarity Scores This step calculates the similarity between attribute values in the mastered entity and the attribute values returned by your selected data provider. This step provides both an average similarity between the mastered entity and the values returned by the data provider, and attribute-level similiarity for key attributes.

Scores range from 0-1, with 0 being low similarity and 1 being high similarity. To calculate these scores, the step first splits the text into tokens on spaces and special characters (with the exception of underscore (_) characters), removes special characters, and converts all letters to lowercase. Then, the step uses a Jaccard similarity function to evaluate the similarity between tokens, returning a score between 0-1.

Unlike other scores, phone number and website similarity are either 0 or 1. A similarity of 1 is returned if the characters in the cleaned source phone number or website and phone number or website returned by the data provider are an exact match, or if one value is contained in the other. Otherwise, the similarity is 0.

You can review these scores on the Entity Details page after the flow runs, and investigate entities with low similiarity scores.

This step calculates the following similarity scores:
  • Tamr Enrich Average Similarity
  • Tamr Enrich Name Similarity
  • Tamr Enrich Alternate Name Similarity
  • Tamr Enrich Address Line 1 Similarity
  • Tamr Enrich City Similarity
  • Tamr Enrich Region Similarity
  • Tamr EnrichPostal Code Similarity
  • Tamr Enrich Phone Number Similarity
  • Tamr Enrich Website Similarity
âś…
Configure Attributes You configure how mastered entity attributes appear in Tamr Cloud and published datasets. If you added new attributes in the Schema Mapping step, add and map those attributes in this step to include them in your final mastered entity output. See Configuring Data Display.