Features of B2B Site Mastering

This template enhances your data with standardized values and consolidates similar records into grouped entities.

About the Data Quality Services

The B2B site mastering template includes data quality services for these attributes:

The data quality process examines values for the template's attributes, and adds any resulting validated, standardized values to each record in new enrichment-specific attributes. The original values mapped from your source datasets remain present and unchanged. See the topics linked above for processing details and added attributes.

About the Clustering Model

The B2B site mastering model groups company records as follows:

First, by trusted_id. Records with the same trusted_id are always clustered together. Records with different trusted_ids are never clustered together.

Records with null/empty trusted_id are clustered based on similarity, meaning that they may be clustered with records that have a trusted _id.

Then, by similarity. Records with null or empty trusted_ids are clustered based on similarities between values for these attributes:

  • Company name and alternate company names
  • Full address
  • Street address
  • City
  • State or region
  • Country
  • Postal code

Note: Generic descriptions, rather than specific attribute names, are listed to represent both the standard schema and the attributes added by the enrichers and other data transformations.

Clustering Examples

Examples of Source Records Clustered Together

Example 1: These records are clustered together because they have highly similar values for the company name and address fields, indicating that they most likely represent the same company site.

Column Record 1 Value Record 2 Value Record 3 Value
company_name RAMSGATE CORPORATION RAMSGATE CORPORATION RAMSGATE CORPORATION
address_line_1 7411 RIGGS ROAD STE 214 7411 RIGGS RD STE 214 7411 RIGGS ROAD STE 214
address_line_2
alternative_names
city HYATTSVILLE HYATTSVILLE
country USA: UNITED STATES OF AMERICA USA: UNITED STATES OF AMERICA USA: UNITED STATES OF AMERICA
phone 3018305395
postal_code 20783 20783 20783
region MD MD MD
url
trusted_id

Expanding on this example, if the addresses in the three records are the same as above, but the company names are RAMSGATE CORPORATION, ESPRESSO EXPRESS, and RAMSGATE CORPORATION, Tamr Cloud clusters the two Ramsgate records together, but not the Espresso business.

Example 2: These records are clustered together because they have the same trusted_id value.

Column Record 1 Value Record 2 Value
company_name LEGAL & GENERAL UCITS ETF MALLINCKRODT
address_line_1 2 GRAND CANAL SQUARE CRUISERATH, BLANCHARDSTOWN DUBLIN 15
address_line_2
alternative_names
city DUBLIN DUBLIN
country IE IE
phone
postal_code D15 TX2V
region
url
trusted_id 3 3

Examples of Source Records Not Clustered Together

Example 1:These records are not clustered together because of highly dissimilar values in address_line_1, indicating that they most likely represent different company sites.

Column Record 1 Value Record 2 Value
company_name G-W MANAGEMENT SERVICES, LLC G-W MANAGEMENT SERVICES, LLC
address_line_1 11600 NEBEL ST STE 202 5010 NICHOLSON LN STE 200
address_line_2
alternative_names
city ROCKVILLE
country USA: UNITED STATES OF AMERICA USA: UNITED STATES OF AMERICA
phone
postal_code 20852 20852
region MD MD
url
trusted_id

Example 2: These records are not clustered together because they have different trusted_id values.

Column Record 1 Value Record 2 Value
company_name AMERICOLD AMERICOLD
address_line_1 10 Glenlake Pkwy Ste. 600 10 Glenlake Pkwy Ste. 600
address_line_2
alternative_names
city Atlanta Atlanta
country US US
phone
postal_code 30328 30328
region Georgia Georgia
url
trusted_id 1 2