Features of B2B Site Mastering
This template enhances your data with standardized values and consolidates similar records into grouped entities.
About the Data Quality Services
The B2B site mastering template includes data quality services for these attributes:
The data quality process examines values for the template's attributes, and adds any resulting validated, standardized values to each record in new enrichment-specific attributes. The original values mapped from your source datasets remain present and unchanged. See the topics linked above for processing details and added attributes.
About the Clustering Model
The B2B site mastering model groups company records as follows:
First, by trusted_id. Records with the same trusted_id
are always clustered together. Records with different trusted_ids
are never clustered together.
Records with null/empty trusted_id
are clustered based on similarity, meaning that they may be clustered with records that have a trusted _id
.
Then, by similarity. Records with null or empty trusted_ids
are clustered based on similarities between values for these attributes:
- Company name and alternate company names
- Full address
- Street address
- City
- State or region
- Country
- Postal code
Note: Generic descriptions, rather than specific attribute names, are listed to represent both the standard schema and the attributes added by the enrichers and other data transformations.
Clustering Examples
Examples of Source Records Clustered Together
Example 1: These records are clustered together because they have highly similar values for the company name and address fields, indicating that they most likely represent the same company site.
Column | Record 1 Value | Record 2 Value | Record 3 Value |
---|---|---|---|
company_name | RAMSGATE CORPORATION | RAMSGATE CORPORATION | RAMSGATE CORPORATION |
address_line_1 | 7411 RIGGS ROAD STE 214 | 7411 RIGGS RD STE 214 | 7411 RIGGS ROAD STE 214 |
address_line_2 | |||
alternative_names | |||
city | HYATTSVILLE | HYATTSVILLE | |
country | USA: UNITED STATES OF AMERICA | USA: UNITED STATES OF AMERICA | USA: UNITED STATES OF AMERICA |
phone | 3018305395 | ||
postal_code | 20783 | 20783 | 20783 |
region | MD | MD | MD |
url | |||
trusted_id |
Expanding on this example, if the addresses in the three records are the same as above, but the company names are RAMSGATE CORPORATION, ESPRESSO EXPRESS, and RAMSGATE CORPORATION, Tamr Cloud clusters the two Ramsgate records together, but not the Espresso business.
Example 2: These records are clustered together because they have the same trusted_id
value.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | LEGAL & GENERAL UCITS ETF | MALLINCKRODT |
address_line_1 | 2 GRAND CANAL SQUARE | CRUISERATH, BLANCHARDSTOWN DUBLIN 15 |
address_line_2 | ||
alternative_names | ||
city | DUBLIN | DUBLIN |
country | IE | IE |
phone | ||
postal_code | D15 TX2V | |
region | ||
url | ||
trusted_id | 3 | 3 |
Examples of Source Records Not Clustered Together
Example 1:These records are not clustered together because of highly dissimilar values in address_line_1, indicating that they most likely represent different company sites.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | G-W MANAGEMENT SERVICES, LLC | G-W MANAGEMENT SERVICES, LLC |
address_line_1 | 11600 NEBEL ST STE 202 | 5010 NICHOLSON LN STE 200 |
address_line_2 | ||
alternative_names | ||
city | ROCKVILLE | |
country | USA: UNITED STATES OF AMERICA | USA: UNITED STATES OF AMERICA |
phone | ||
postal_code | 20852 | 20852 |
region | MD | MD |
url | ||
trusted_id |
Example 2: These records are not clustered together because they have different trusted_id
values.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | AMERICOLD | AMERICOLD |
address_line_1 | 10 Glenlake Pkwy Ste. 600 | 10 Glenlake Pkwy Ste. 600 |
address_line_2 | ||
alternative_names | ||
city | Atlanta | Atlanta |
country | US | US |
phone | ||
postal_code | 30328 | 30328 |
region | Georgia | Georgia |
url | ||
trusted_id | 1 | 2 |
Updated 7 days ago