About Persistent Identifier Fields
Tamr Cloud assigns a unique, persistent identifier to each entity. The same identifier is added to each source record in a cluster and to the mastered entity record.
When you add a source dataset to Tamr Cloud, the source dataset must include a field for the unique primary key. During the mastering process, Tamr Cloud also assigns a unique, persistent identifier to each business entity. The same identifier is added to each source record in a source record cluster and to the entity.
Primary Key Fields from Source Datasets
The following field stores the unique primary key from the source datasets. This field is available by viewing step output in Designer, source record tables in the Tamr Cloud UI, and in exported datasets, as described in the table below.
Field | Step Output or Exported Dataset | Notes |
---|---|---|
entityId | Apply Clustering Model Step Source Records by Entities Dataset Entity Source Record tables | Unlike the entity_id field in the Consolidate Records step output, which stores the Tamr Cloud-generated persistent identifier for entities and source record clusters, this field stores the tamr_record_id .For data products created before June 1, 2023, the tamr_record_id is the concatenation of the source dataset name and the source primary key, separated by an underscore.For data products created on or after June 1, 2023, the tamr_record_id is a 128-bit hash value of the source dataset name and the source primary key. |
Persistent Identifiers Assigned by Tamr Cloud
The following fields store the same Tamr Cloud-generated identifier for a mastered entity record and its clustered source records. The Tamr Cloud-generated identifier is a 128-bit universally unique identifier (UUID).
These fields are available by viewing step output in the Configure Flow page, and in source record tables in the Tamr Cloud UI and exported datasets, as described in the table below.
Field | Step Output or Exported Dataset | Notes |
---|---|---|
clusterId1 | Entities by Similarity Dataset | This is the first entity in the pair of entities being compared for similarity. See Datasets Available for Export for more information. |
clusterId2 | Entities by Similarity Dataset | This is the second entity in the pair of entities being compared for similarity. See Datasets Available for Export for more information. |
entity_id | Consolidate Records Step | |
Entity ID | Entities Dataset | This is the default field name of the entity_id output field as configured in the Configure Attributes step. Depending on your step configuration, this field might have a different name. |
persistentId | Apply Clustering Model Step Source Records by Entities Dataset Entity Source Record tables | |
suggestedClusterId | Apply Clustering Model Step Source Records by Entities Dataset Entity Source Record tables | This is the persistent ID assigned by the clustering model. If a user overrides clusters, the persistent ID for the new source record cluster is available in the verifiedClusterId field. |
tamr_id | Consolidate Records Step | |
Tamr_ID | Entities Dataset | This is the default field name of the tamr_id output field as configured in the Configure Attributes step.Depending on your step configuration, this field might have a different name. |
Additional Fields for Cluster Overrides
The following fields store values only for source records to which a user has applied cluster overrides. If cluster overrides have not been applied, these fields are empty. These fields are available by viewing step output in the Configure Flow page, and in source record tables and exported datasets, as described in the table below.
Field | Step Output or Exported Dataset | Notes |
---|---|---|
verifiedClusterId | Apply Clustering Model Step Source Records by Entities Dataset Entity Source Record tables | If a user overrides clusters, this field stores the persistentId of the source record cluster to which the record was moved.Note: The source record cluster assigned by the model is stored in the suggestedClusterId field. |
verificationType | Apply Clustering Model Step Source Records by Entities Dataset Entity Source Record tables | The value in this field indicates how Tamr Cloud applied the override. This field is always set to suggest , meaning that Tamr Cloud applied the override after applying the clustering model. |
Example: Persistent Identifiers Available after Clustering
In the following example, a user has moved a source record for the A&H Automotive Industries company from the source record cluster suggested by clustering model into a source record cluster for the A&L Sanchez Painting company. Before the user applied cluster overrides, the A&L Sanchez Painting company source record cluster included two source records.
This example shows the output of the Apply Clustering Model step, filtered to the relevant fields. The same fields also are included in the source record tables, and in the Source Records by Entity exported dataset.

Persistent identifer example
- suggestedClusterId: This field provides the persistent identifier for the source record cluster created by the clustering model. The clustering model assigned the A&H Automotive Industries to a different source record cluster than the records for A&L Sanchez Painting and Construction Company:
- For A&H Automotive Industries, the
suggestedClusterId
ise6ef4554-21a8-38d6-96b7-423d87640455
. - For the two A&L Sanchez Painting and Construction Company, the
suggestedClusterId
is1ed1f847-e992-3fa6-a143-70a1a9cbd0d5
.
- For A&H Automotive Industries, the
- verificationType: This field indicates how Tamr Cloud applied the cluster override. This field is always set to
suggest
, meaning that Tamr Cloud applied the override after applying the clustering model. - verifiedClusterId: This field provides the persistent identifier for the source record cluster after overrides were applied. The records for A&H Automotive Industries and A&L Sanchez Painting and Construction have been assigned to the same source record cluster through cluster overrides:
1ed1f847-e992-3fa6-a143-70a1a9cbd0d5
. - entityId: This field provides the primary key value for each record from the input dataset.
- persistentId: This field provides the final persistent identifier for the all records in the source record cluster. Note that the field value is the same as the value of the
verifiedClusterId
:1ed1f847-e992-3fa6-a143-70a1a9cbd0d5
.
Updated about 1 month ago