Source Dataset Requirements for Legal Entities

You align your source data with the industry-standard schema for company data that is supplied by this template.

The legal entities template includes a predefined, standardized schema for company data. The mastering flow for data products produced by this template includes a schema mapping step in which you identify how columns in your source datasets correspond to the attributes in the supplied schema.

To prepare, review the general Requirements for Source Datasets. Then, identify the column or columns in each of your source datasets that you will map to attributes in the unified schema. After you map your source data columns, Tamr Cloud can enrich your data and consolidate similar records into entities.

The table below describes these attributes and explains which are:

  • Required: The Schema Mapping step will be marked as incomplete and you cannot run the flow until you map source columns to these attributes.
  • Recommended: For optimal enrichment and clustering results, map source columns to these attributes.
  • Optional: These attributes have minimal impact on your clustering and enrichment results. If your source data includes columns that match these attributes, map them to include that source data in your completed data product.
Unified Attribute Description Type
Address_Line_1 Line 1 of the company’s address. Recommended
Address_Line_2 Line 2 in the company's address. Optional
Address_Type Whether the company address represents the headquarters or a branch location. Optional
Alternative_Names Alternate names for the company. Optional
Associated_Persons Officers, Directors, and other people associated with the company. Optional
City City of the company’s address. Recommended
Company_Name Company's name. Required
Company_Registration_Number Legal entity ID with local government. Optional
Company_Type Legal status of the company (for example, LLC, LTD, and so on). Optional
Country Country of the company’s address. Recommended
Founding_Year Year of company formation. Optional
Phone Company’s phone number. Recommended
Postal_Code Postal (zip) code of the company’s address Recommended
Previous_Names Previous legal names for the company. Optional
primaryKey The primary key used in the source dataset to uniquely identify each record. See About Primary Keys for more information. Required
Region Region/state of the company’s address. Recommended
Stock_Exchange Stock exchange where the company is listed (if public). Optional
Tax_IDs Identification numbers, such as EINs. Optional
Ticker_Symbol Company’s ticker symbol on the stock exchange. Optional
trusted_id A non-unique key, such as a customer identification number used by your internal systems. The clustering model always clusters together records that have the same trusted_id. If your data does not include identifiers that represent a definite match, do not map any columns to trusted_id. Optional
URL Primary website domain for the company. Recommended

Tip: You can also add attributes to the unified schema and map columns that you want to include in the mastered data product to them. The template does not use these additional attributes as part of the mastering process.