Legal Entity Mastering
You use the Legal Entity template to cluster together data about companies, businesses, or organizations that represent the same legal entity.
You use the Legal Entity template for compliance use cases, or to explore investment opportunities based on the ultimate parent company. This template includes a pre-configured schema, trained machine learning model, and data cleaning services to enable you to quickly get complete information for each legal entity.
Input Dataset Requirements
As part of the mastering flow, Tamr Cloud aligns your input datasets to a unified schema with predefined output fields. Tamr Cloud uses these predefined output fields to enrich your data and consolidate similar records into entities.
In addition to the general Requirements for Input Datasets, certain data is required for Legal Entity mastering.
You must map one or more input fields to each of the predefined output fields:
uniqueKey
This is the primary key for the dataset. See About Primary Keys for more information.company_name
alternative_names
address_line_1
address_line_2
city
region
country
postal_code
phone
Clustering Model
For Legal Entity mastering, the clustering model considers the similarity of values for the following fields, and then uses decision-tree logic to accurately identify records that refer to the same entity:
- Company name and alternate company names
- Legal entity name and alternative legal entity names, provided by the Company enricher
- Full address
- Country
Tamr Cloud looks for matches and highly similar values for company name and country. Full address is not required to match in the Legal Entity template. If you require full address to match, use the B2B Site Mastering template.
Clustering Examples for Legal Entity Mastering
Examples of Source Records Clustered Together
These records are clustered together because they have the same or highly similar values for company name and some address fields, but different street addresses, indicating that they most likely represent different sites for the same legal entity.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | G-W MANAGEMENT SERVICES, LLC | G-W MANAGEMENT SERVICES, LLC |
address_line_1 | 11600 NEBEL ST STE 202 | 5010 NICHOLSON LN STE 200 |
address_line_2 | ||
alternative_names | ||
city | ROCKVILLE | |
country | USA: UNITED STATES OF AMERICA | USA: UNITED STATES OF AMERICA |
phone | ||
postal_code | 20852 | 20852 |
region | MD | MD |
Examples of Source Records Not Clustered Together
These records are not clustered together because of missing or dissimilar address information, and only moderately similar company names, indicating that they most likely represent different legal entities.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | BALFOUR BEATTY LLC | Balfour Beatty Construction, LLC |
address_line_1 | SUITE 322 | 3100 Mckinnon St Fl 10 |
address_line_2 | ||
alternative_names | ||
city | WILMINGTON | Dallas |
country | US | United States |
phone | 214-451-1000 | |
postal_code | 19805 | 75201-7007 |
region | Texas |
Modifying the Mastering Flow
Designer Flow Step Overview
When you create a data product using the Legal Entity template, Tamr Cloud creates a mastering flow in Designer with steps specific to legal entity mastering:
- Add Data
- Align to Customer Data Model
- Prepare Data for Enrichment
- Enrich Company Name
- Enrich Country Code
- Prepare for Clustering
- Apply Clustering Model
- Consolidate Records
- Deliver Data to Studio
Add Data
Ensure your input data meets both the general requirements for input datasets and the specific requirements for this template.
Then, see Adding a Dataset.
Align to Customer Model
See Mapping Input Fields to a Unified Schema.
Prepare Data for Enrichment
This step transforms the data in the unified dataset to match the expected inputs to the enrichers included in the mastering flow.
This step also adds the following fields to the unified schema:
- full_address: This steps concatenates all address values into a single full address.
- google_lookup: This steps generates a google search link for the company, using the value in the company_name field.
You do not need to make changes to this step.
Enrich Company Name
This step cleans and enriches company name data. You do not need to make changes to this step.
See Company Name Enrichment for information about this enricher, including the output fields it adds to your data.
Enrich Country Code
This step enriches records with the standardized ISO 3166-1 alpha-2 two character country code values. You do not need to make changes to this step.
See Country Code Enrichment for information about this enricher, including the output fields it adds to your data.
Prepare for Clustering
This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with ml_
. Many of these ml_
fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.
Apply Clustering Model
This read-only step groups records that refer to the same entity into a cluster, using the trained model.
Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Available Published Datasets.
Consolidate Records
This step applies rules to produce a single record, called the mastered entity record, that best represents a cluster. For most fields, these rules select the most common value from the clustered records.
Additionally, this step adds a Tamr ID (tamr_id) to each mastered entity record. The Tamr ID is a unique, persistent id.
If you added a new field in the Schema Mapping step, add a line in the transformations to tell Tamr Cloud what value to set for that field when creating the mastered entity record. See Modifying Record Consolidation Transformations.
Deliver to Studio
This step allows you to configure how data appears in Studio, Curator, and published datasets. See Configuring Data Display in Studio.
Updated 9 days ago