Maintaining Tamr IDs
Tamr Cloud requires persistent primary keys in order to maintain persistent Tamr IDs.
You must preserve primary keys and maintain the source dataset name if you update the source dataset; otherwise the Tamr ID will change.
There are several situations in which the Tamr ID for clustered source records and the golden record (mastered entity) can change:
- The source record primary key changes.
- The source dataset name changes.
- The source record values have changed and it no longer matches a previously assigned cluster.
- New source records are added that match with existing records and impact clustering.
- External enrichment or data quality service providers update their data; changed enrichment values can impact clustering.
- You select different or additional external enrichment providers; new enrichment values can impact clustering.
Consider the examples below:
Example 1: The primary keys have changed in cluster source records, resulting in a new Tamr ID.
In this example, four source records are clustered together into the mastered Victor Marks
entity. The mastered entity and its clustered records have been assigned a persistent Tamr ID, as shown in the Tamr ID
and Persistent ID
columns in the image below.
The source dataset is then updated in Tamr Cloud, and the updated dataset includes different primary key values for the clustered source records, as shown in the Unique Key
column in the image below. When the mastering flow is run, Tamr Cloud considers these to be new records because their primary key values have changed. As a result, the records are still clustered together, but the records and the mastered Victor Marks entity
are assigned a new Tamr ID, as shown the Tamr ID
and Persistent ID
columns in the image below. The original Tamr ID
is retired.
Example 2: The values in a clustered source record changed, resulting in that source record being clustered into a different mastered entity .
In this example, three source records are clustered into the mastered John Adams
entity.
The content of one of the source records for John Adams
is then modified by the contact management system used to obtain the source records. During the next mastering flow run, Tamr Cloud considers the modified source record to no longer match the original cluster records and moves it to a different cluster (Jonathan Adams
). As a result, Tamr Cloud assigns that record to a new cluster with a new Tamr ID, and the original cluster record retains the same Tamr ID, as shown in the as shown the Persistent ID
columns in the images below.
Original cluster:
New cluster:
Updated 25 days ago