B2B Site Mastering

You use the B2B Site Mastering template to master company data by branch or location.

The B2B Site Mastering template uses an industry-standard schema and trained machine learning model to build a common view of your customers. Use this template if you need to differentiate between various company locations. The template provides company name, phone number, and address enrichment data, helping ensure you have the most complete and up-to-date information for each company.

Input Dataset Requirements

As part of the mastering flow, Tamr Cloud aligns your input datasets to a unified schema with predefined output fields. Tamr Cloud uses these predefined output fields to enrich your data and consolidate similar records into entities.

In addition to the general Requirements for Input Datasets, certain data is required for B2B Site Mastering.

You must map one or more input fields to each of the predefined output fields:

  • uniqueKey
    This is the primary key for the dataset. See About Primary Keys for more information.
  • company_name
  • alternative_names
  • address_line_1
  • address_line_2
  • city
  • region
  • postal_code
  • phone

Clustering Model

For B2B Site Mastering, the clustering model deduplicates your data by considering the similarity of values for the following fields and using decision-tree logic to accurately identify records that refer to the same entity:

  • Company name and alternate company names
  • Full address
  • Street address
  • City
  • State or region
  • Country
  • Postal code

Tamr Cloud looks for similarities in these fields, not exact matches. For example, two addresses on the same street can correspond to the same site.

Clustering Examples for B2B Site Mastering

Examples of Source Records Clustered Together

These records are clustered together because they have the same or highly similar values for company name and address fields, indicating that they most likely represent the same company site.

Column Record 1 Value Record 2 Value Record 3 Value
company_name RAMSGATE CORPORATION RAMSGATE CORPORATION RAMSGATE CORPORATION
address_line_1 7411 RIGGS ROAD STE 214 7411 RIGGS RD STE 214 7411 RIGGS ROAD STE 214
address_line_2
alternative_names
city HYATTSVILLE HYATTSVILLE
country USA: UNITED STATES OF AMERICA USA: UNITED STATES OF AMERICA USA: UNITED STATES OF AMERICA
phone 3018305395
postal_code 20783 20783 20783
region MD MD MD

Examples of Source Records Not Clustered Together

These records are not clustered together because of highly dissimilar values in address_line_1, indicating that they most likely represent different company sites.

Column Record 1 Value Record 2 Value
company_name G-W MANAGEMENT SERVICES, LLC G-W MANAGEMENT SERVICES, LLC
address_line_1 11600 NEBEL ST STE 202 5010 NICHOLSON LN STE 200
address_line_2
alternative_names
city ROCKVILLE
country USA: UNITED STATES OF AMERICA USA: UNITED STATES OF AMERICA
phone
postal_code 20852 20852
region MD MD

Modifying the Mastering Flow

Designer Flow Step Overview

When you create a data product using the B2B Site Mastering template, Tamr Cloud creates a mastering flow in Designer with steps specific to site mastering:

Add Data

Ensure your input data meets both the general requirements for input datasets and the specific requirements for this template.

Then, see Adding a Dataset.

Align to Customer Model

See Mapping Input Fields to a Unified Schema.

Create tamr_record_id

This transformation step ensures that each source record has a unique primary key across all source datasets, by concatenating the source dataset name and the unique key, separated by an underscore, from the source record into a new primary key field: tamr_record_id.

During the mastering flow, this step ensures that the source dataset primary keys are unique across datasets, by creating a new primary key field, tamr_record_id, which is a concatenation of the source dataset name and uniqueKey field.

Transformation logic:

608

Create tamr_record_id transformation step.

Important: If records within the same source dataset have duplicate primary key values, the tamr_record_id value for those records will also be duplicates.

You do not need to modify this step.

Prepare Data for Enrichment

This step transforms the data in the unified dataset to match the expected inputs to the enrichers included in the mastering flow.

This step also adds the following fields to the unified schema:

  • full_address: This steps concatenates all address values into a single full address.
  • google_lookup: This steps generates a google search link for the company, using the value in the company_name field.

You do not need to make changes to this step.

Enrich Phone Number

This step validates, standardizes, and enriches phone number data. You do not need to make changes to this step.

See Phone Number Enrichment for information about this enricher, including the output fields it adds to your data.

Enrich Address

This step standardizes and enriches address data. You do not need to make changes to this step.

See Address Enrichment for information about this enricher, including the output fields it adds to your data.

Enrich Company Name

This step cleans and enriches company name data. You do not need to make changes to this step.

See Company Name Enrichment for information about this enricher, including the output fields it adds to your data.

Prepare for Clustering

This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with ml_. Many of these ml_ fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.

Apply Clustering Model

This read-only step groups records that refer to the same entity into a cluster, using the trained model.

Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Available Published Datasets.

Consolidate Records

This step applies rules to produce a single record, called the mastered entity record, that best represents a cluster. For most fields, these rules select the most common value from the clustered records.

Additionally, this step adds a Tamr ID (tamr_id) to each mastered entity record. The Tamr ID is a unique, persistent id.

If you added a new field in the Schema Mapping step, add a line in the transformations to tell Tamr Cloud what value to set for that field when creating the mastered entity record. See Modifying Record Consolidation Transformations for instructions on adding fields in this step.

Deliver to Studio

This step allows you to configure how entity data appears in Studio, Curator, and published datasets. See Configuring Data Display in Studio.