Features of Legal Entities
This template enhances your data with standardized values and consolidates similar records into grouped entities.
About the Data Quality Services
The legal entities template includes data quality services for these attributes:
The data quality process examines values for the template's attributes, and adds any resulting validated, standardized values to each record in new enrichment-specific attributes. The original values mapped from your source datasets remain present and unchanged. See the topics linked above for processing details and added attributes.
About the Clustering Model
The legal entities model groups company records as follows:
First, by trusted_id. Records with the same trusted_id
are always clustered together. Records with different trusted_ids
are never clustered together.
Records with null/empty trusted_id
are clustered based on similarity, meaning that they may be clustered with records that have a trusted _id
.
Then, by similarity. Records with null or empty trusted_ids
are clustered based on similarities between values for these attributes:
- Company name and alternate company names
- Legal entity name and alternative legal entity names, provided by the Company enricher
- Full address
- Country
- Website
Note: Generic descriptions, rather than specific attribute names, are listed to represent both the standard schema and the attributes added by the enrichers and other data transformations.
Clustering Examples for Legal Entities
Examples of Source Records Clustered Together
Example 1: These records are clustered together because they have the same or highly similar values for company name and some address fields, but different street addresses, indicating that they most likely represent different sites for the same legal entity.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | G-W MANAGEMENT SERVICES, LLC | G-W MANAGEMENT SERVICES, LLC |
address_line_1 | 11600 NEBEL ST STE 202 | 5010 NICHOLSON LN STE 200 |
address_line_2 | ||
alternative_names | ||
city | ROCKVILLE | |
country | USA: UNITED STATES OF AMERICA | USA: UNITED STATES OF AMERICA |
phone | ||
postal_code | 20852 | 20852 |
region | MD | MD |
url | ||
trusted_id |
Example 2: These records are clustered together because they have the same trusted_id
.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | LEGAL & GENERAL UCITS ETF | MALLINCKRODT |
address_line_1 | 2 GRAND CANAL SQUARE | CRUISERATH, BLANCHARDSTOWN DUBLIN 15 |
address_line_2 | ||
alternative_names | ||
city | DUBLIN | DUBLIN |
country | IE | IE |
phone | ||
postal_code | D15 TX2V | |
region | ||
url | ||
trusted_id | 3 | 3 |
Examples of Source Records Not Clustered Together
Example 1: These records are not clustered together because of missing or dissimilar address and website information, and only moderately similar company names, indicating that they most likely represent different legal entities.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | BALFOUR BEATTY LLC | Balfour Beatty Construction, LLC |
address_line_1 | SUITE 322 | 3100 Mckinnon St Fl 10 |
address_line_2 | ||
alternative_names | ||
city | WILMINGTON | Dallas |
country | US | United States |
phone | 214-451-1000 | |
postal_code | 19805 | 75201-7007 |
region | Texas | |
url | https://www.balfourbeatty.com/ | |
trusted_id |
Example 2: These records are not clustered together because they have different values for trusted_id
.
Column | Record 1 Value | Record 2 Value |
---|---|---|
company_name | AMERICOLD | AMERICOLD |
address_line_1 | 10 Glenlake Pkwy Ste. 600 | 10 Glenlake Pkwy Ste. 600 |
address_line_2 | ||
alternative_names | ||
city | Atlanta | Atlanta |
country | US | US |
phone | ||
postal_code | 30328 | 30328 |
region | Georgia | Georgia |
url | ||
trusted_id | 1 | 2 |
Updated about 1 year ago