Glossary

Definitions for words commonly used when interacting with Tamr Cloud.

A

Admin
A role that allows users to fully manage all resources and user accounts in Tamr Cloud.

Attribute
A data field in a mastered entity.

Author
A role that allows users to add new resources to Tamr Cloud, such as data products, connections, and sources, and fully manage the resources that they own.

Automap
A feature that allows users to automatically map source columns to appropriate attributes in the unified schema.

B

Blank Value
A string with only whitespace or 0 characters.

Business Type
Semantic data types (ID, address, phone number, date, or data) that can help users identify key attributes in a data product. Also referred to as an attribute type.

C

Cluster
Source records that refer to the same real-world entity and have been grouped together by the data mastering process.

Company Site
A specific branch or location for a company.

Connections
Integrations to external data repositories, such as Snowflake or AWS S3. Connections allow you to import source datasets and export data product data.

Curate
The process of reviewing, adjusting, and maintaining mastered datasets for use by other users or downstream systems.

Curator
A user permission for data products that allows the user to view, publish, and curate the data product.

D

D&B Datablocks
Company enrichment data managed and provided by Dun & Bradstreet. This data is organized into logical and thematic groupings called Data Blocks, such as “Company Information” and “Hierarchies and Connections”. See
D&B Direct+ documentation and the D&B Enrich topic.

D-U-N-S Number
A unique, propriety nine-digit identifier for businesses provided by Dun & Bradstreet. See Dun & Bradstreet and the D-U-N-S Match topic.

Data Citizen
A role that allows users to access resources shared with them in Tamr Cloud, such as data products and connections, but cannot create new resources.

Data Product
A data product is a consumption-ready set of high-quality, trustworthy, and accessible data that people across an organization can use to solve business challenges. Data products are comprehensive, clean, curated, regularly updated datasets for key business entities that both people and downstream systems can consume broadly and securely across an enterprise.

Data Product Template
A turnkey, entity-specific solution for creating high quality data products. Data product templates provide steps that align datasets to industry-standard schema, clean and enrich the data, cluster duplicate source records, and produce a mastered entity for each cluster with the most accurate, up-to-date, and thorough data possible.

Deduplication
The process of dataset mastering, also called entity consolidation. In Tamr Cloud, the trained clustering model deduplicates data by identifying and consolidating records from source datasets that refer to the same real-word entity.

E

Editor
A user permission that provides edit access to a source, such as a data product, connection, export destination, or source.

Empty Value
Either a true null value or a blank value. A blank value is a string with only whitespace or 0 characters.

Enriched Data
Data that has been validated and standardized by Tamr Enrichment services, and supplemented with external data.

Enrichment
The process of integrating data quality services and related external data to produce more complete and accurate mastered datasets.

Entity/Entities
Generic term for the results of a mastering flow. An entity includes both the mastered entity record and clustered source records.

F

Firmographic Data
Reference data about an organization, such as its revenue, industry, location, corporate hierarchy, number of employees, and so on. See Tamr’s Firmographic Enrichment.

Input Datasets
See Source Datasets.

L

Legal Entity
A company or organization that has legal rights and obligations.

M

Mastered Patient Index (MPI)
An index used by healthcare organizations to maintain accurate medical data across its various departments. This index applies unique identifiers to patients so that they are represented only once across multiple organizations.

Machine Learning (ML)
A type of AI (artificial intelligence) system that finds and uses patterns in data to achieve a goal. Tamr Cloud clustering models use unsupervised machine learning models to group similar records into clusters by applying the same algorithms consistently.

Mastered Entity
A record that best represents a cluster of source records. Mastered entities are created by the Record Consolidation step in the Tamr Cloud mastering flow, which applies rules to select the most appropriate value from the clustered source records for each attribute in the mastered entity.

Mastering Flow
Template-specific set of steps to align your data to a unified schema, provide data enrichment, cluster similar records, and create the mastered entity for each cluster.

N

Null Value
A true null value.

P

Primary Key
A primary key is a single field that uniquely identifies a record in a dataset.

Primary keys are unique and stable over time:

  • Unique: each primary key appears only once in the dataset.
  • Stable: the key for a given record does not arbitrarily change over time.

Tamr suggests the primary key to be meaningful to the data, as this reduces the likelihood of breaking changes upstream.

For example, if there is a designated primary key in the source system, it may be best to use this as the primary key, rather than another unique key.

Publish
A user permission for data products that allows the user to view and export the data product.

Also the action of exporting data product datasets to external data repositories, or for downloading locally.

R

Record Consolidation
The process of applying rules to create a mastered entity record during the mastering. See Mastered Entity.

Reference Data
Data used to reference or define other data. For example, firmographic data is reference data about organizations. See Firmographic Data.

S

Schema
The organization and structure of a dataset. See Database schema.

Schema Mapping
The process of mapping source dataset columns to attributes in a unified schema.

Site Mastering
Mastering data at the site or location level for a company.

Source
A dataset that is imported to Tamr Cloud for mastering. Also referred to as a source dataset or input dataset.

Source Columns
A column in the source datasets that is added into Tamr Cloud.

Source Record
A record in a source dataset.

T

Tamr Enrich
A firmographic enrichment service that appends mastered company data with reference data, such as name, address, website, and operating status of each company and its parent organizations.

Tamr Enrich ID
A unique, proprietary organizational identifier used to enrich your source data with Tamr’s firmographic data or other external reference data. See Tamr Enrich ID.

Tamr ID
A persistent identifier assigned by Tamr, that serves as a long-lasting reference across multiple data sources. Tamr IDs act as a primary key that organizations can use across multiple databases when there is no common identifier for the same entity, such as a person, a product, or an organization.

Transformation
Data transformation changes the format, structure, or values of data in a unified dataset.

Trusted_ID

This is a non-unique key that will guarantee that two distinct records with the same trusted_id are in the same cluster.

U

Unified Attribute
Derived attributes that you create by mapping one or more columns from your input datasets into a single attribute in the schema for the unified dataset. See Attribute.

Unified Dataset
The result of aligning source dataset columns to the attributes provided by a data product template.

V

Viewer
A user permission that provides read-only access to a resource, such as a data product, connection, export destination, or source.