Modifying Source Record Clusters

You can override Tamr Cloud-computed clusters.

When you run a mastering flow, Tamr Cloud groups records that refer to the same entity into a cluster, using a trained model. Each entity can represent a cluster of one to thousands of records.

You can review the entities and their clustered source records for data products to which you have access. You can:

  • Quickly merge several entities.
  • Review similar entities and move individual source records between them.
  • Create entirely new entities from source records.

When you make any changes to record clusters, your changes are shown as pending until the flow is run, and then persist on future mastering flow runs. All applied manual changes are shown as verifications. See Verifying Source Records for more information about record verifications.

You can make multiple changes to a source record before the flow next runs.

When the flow runs, the Apply Clustering step clusters source records using the model, and then applies your overrides. Because overrides are applied after clustering, they do not impact how other source records are clustered.

Note: When you change the source records for an entity, different values may be selected for entity attributes in the next flow run.

Merging Entities

If you know that two or more entities refer to the same real-word entity, you can quickly merge them. When you merge entities, you select the entity into which the others will be merged. The Tamr ID for the selected entity remains the same, and all source records from the other merged entities move to this entity. The entities merged into this entity are removed, along with their Tamr IDs. This is the same as manually moving every record from one entity into another.

To merge entities:

  1. Select the checkboxes for two or more entities to merge.
    You can sort and filter the table to find specific entities.
  2. From the dropdown, choose Actions > Merge.
  3. In the confirmation dialog, review the list of entities being merged and choose the entity to merge into. Then select Next.
  4. Select Merge.

Changes are shown as pending record pending move until the flow is run. If you need to revert this change, see Reverting Changes below.

Example: Merging entities

In the example below, a user merged together three contact records for Eve Sanchez, selecting the mastered entity with the most source records as the surviving entity. Note that the three entities are marked as pending changes. The surviving entity is at the top, while the two entities that will be merged into the survivor are in italics below it.

Moving Source Records between Entities

You can compare the source records of several entities to determine whether they have been clustered correctly. You can then move the source records between entities as needed.

important Important: If you move all source records out of an entity, that entity, along with its Tamr ID, will be removed the next time the flow is run.

To move source records between clusters:

  1. Open your data product.
  2. In the Entities table, select the rows of the entities whose source records you want to review, and then select Actions > Manage Cluster Details.
  3. If needed, select additional source record clusters for comparison from the Filter Source Records by Entity panel, or to remove clusters that you no longer want to compare.
    Note: In the Filter Source Records by Entity panel, you can select set as primary next to the entity to switch the primary entity.
  4. To move records, select one or more record, and either:
    1. Drag and drop the selected records to its correct entity. Confirm the move.
    2. Select records, then choose Actions > Move. Confirm records to move into the chosen entity.

Source record changes are shown as pending addition (+) or removal (-) until the flow is run. If you need to revert the changes, see Reverting Changes below.

In the top right corner, select Back to Viewing All Data to return to the main data table.

Example: Moving source records between clusters

In the example below, the user has selected two entities for comparison: Juliana Olie and Juliana Olie. After review, the user chooses to merge these two clusters. Notice that the moved record is marked as pending removal (-) from its original cluster, and pending addition (+) to the new cluster.

Creating a New Entity

On the Manage Cluster Details page, you can choose to create a new entity from one or more source records. This moves all of the selected records into the the same new entity. For example, if you selected three source records, all three would be clustered into the same new entity. Your changes will be pending until the next time the flow is run.

To create a new entity:

  1. Open your data product.
  2. In the Entities table, select one or more entities whose source records you want to review, then select Actions > Manage Cluster Details.
    You can sort and filter the table to find specific entities.
  3. Select one or more source records to move to a new entity.
  4. Select Actions > Create New Entity.
  5. Confirm the source records you want to move into a new entity, then select Create.
  6. When you have completed all of your changes, select Close to save your changes.
  7. Navigate back to the entities table. Changes are shown in pending state until the flow is run.

These records are shown in pending new record pending new entity state. If you need to revert the changes, see Reverting Changes below.

When you have finished your changes, select Back to Viewing All Data in the top right to return to the main data table.

Example: Creating a new entity

In the example below, the user chooses to review the source records for the Alia Gutierrez entity, and decides that one of the records represents a different person, Alicia Gutierrez. After checking that there is no existing "Alicia Gutierrez" cluster that the records could be moved into, the user selects the record and chooses the Create New Entity action. Note that the moved record is marked as pending new.

Reverting Cluster Changes

You can revert any changes made to source record clusters before the next flow is run. Reverting a change returns the source record to the cluster to which it was assigned in the previous flow run. For example, if you move a source record from cluster A to cluster B, and then move it to cluster C, revert will return it to cluster A.

When you revert a change, the reverted source record is marked as pending verification in the original cluster. See Verifying Source Records.

Revert changes from the Manage Clusters Details page.

To revert a cluster change:

  1. On the Manage Cluster Details page, select the checkbox for the record pending change.
  2. Select Actions > Revert Record Moves.

You can also move records multiple times between flow runs; instead of reverting the change, you can simply move the record to its original cluster or a different cluster.

Example: Reverting a moved source record

In the example below, the user chooses to revert a change to move a Juliana Olie source record from one cluster to another. Note that after the user selects to revert the change, the record is marked as pending verification in its original cluster in the next flow run.

Viewing Cluster Override History

You can track cluster override actions in the activity log.

Additionally, after you run the flow to apply cluster overrides, you can view both the cluster assigned to a source record by the clustering model and the cluster to which the record was moved by an override in the following places:

  • Apply Clustering step output.
  • Source record tables.
  • Source Records by Entity exported dataset.

The cluster assigned by the clustering model is stored in the suggestedClusterId field.
The cluster to which a record has been moved through overrides is stored in the verifiedClusterId field.

See About Persistent Identifier Fields for more information on these fields.