Modifying Source Record Clusters

You can override Tamr Cloud-computed clusters.

When you run a mastering flow, Tamr Cloud groups records that refer to the same entity into a cluster, using a trained model. Each entity can represent a cluster of one to thousands of records.

You can review the entities and their clustered source records for data products to which you have access. You can:

  • Quickly merge several entities.
  • Review similar entities and move individual source records between them.
  • Create entirely new entities from source records.

When you make any changes to record clusters, your changes are pending until the flow is run, and then persist on future mastering flow runs. After the changes are applied, the source records are marked as verified in the cluster. See Verifying Source Records for more information about record verifications.

You can make multiple changes to a source record before the flow next runs.

When the flow runs, the Apply Clustering step clusters source records using the model, and then applies your overrides. Because overrides are applied after clustering, they do not impact how other source records are clustered.

Note: When you change the source records for an entity, different values may be selected for entity attributes in the next flow run.

Merging Entities

If you know that two or more entities refer to the same real-word entity, you can quickly merge them. When you merge entities, you select the entity into which the others will be merged. The Tamr ID for the selected entity remains the same, and all source records from the other merged entities move to this entity. The entities merged into this entity are removed, along with their Tamr IDs. This is the same as manually moving every record from one entity into another.

To merge entities:

  1. Select the checkboxes for two or more entities to merge.
    You can sort and filter the table to find specific entities.
  2. From the dropdown, choose Actions > Merge.
  3. In the confirmation dialog, review the list of entities being merged and choose the entity to merge into. Then select Next.
  4. Select Merge.

Changes are in pending state until the flow is run. If you need to revert this change, see Reverting Changes below.

Moving Source Records between Entities

You can compare the source records of several entities to determine whether they have been clustered correctly. You can then move the source records between entities as needed.

important Important: If you move all source records out of an entity, that entity, along with its Tamr ID, will be removed the next time the flow is run.

To move source records between clusters:

  1. Open the data product.
  2. In the Entities table, select the rows of the entities whose source records you want to review, and then select Actions > Manage Cluster Details.
  3. If needed, select additional source record clusters for comparison from the Filter Source Records by Entity panel, or to remove clusters that you no longer want to compare.
    Note: In the Filter Source Records by Entity panel, you can select set as primary next to the entity to switch the primary entity.
  4. To move records, select one or more record, and either:
    1. Drag and drop the selected records to its correct entity. Confirm the move.
    2. Select records, then choose Actions > Move. Confirm records to move into the chosen entity.

Source record changes are pending until the flow is run. If you need to revert the changes, see Reverting Changes below.

In the top right corner, select Back to Viewing All Data to return to the main data table.

Example: Moving source records between clusters

In the example below, the user has selected two entities for comparison: Juliana Olie and Juliana Olie. After review, the user chooses to merge these two clusters. Notice that the moved record is marked as pending removal (-) from its original cluster, and pending addition (+) to the new cluster.

Creating a New Entity

On the Manage Cluster Details page, you can choose to create a new entity from one or more source records. This moves all of the selected records into the the same new entity. For example, if you selected three source records, all three would be clustered into the same new entity. Your changes will be pending until the next time the flow is run.

To create a new entity:

  1. Open the data product.
  2. In the Entities table, select one or more entities whose source records you want to review, then select Actions > Manage Cluster Details.
    You can sort and filter the table to find specific entities.
  3. Select one or more source records to move to a new entity.
  4. Select Actions > Create New Entity.
  5. Confirm the source records you want to move into a new entity, then select Create.
  6. When you have completed all of your changes, select Close to save your changes.
  7. Navigate back to the entities table. Changes are pending until the flow is run.

If you need to revert the changes, see Reverting Changes below.

When you have finished your changes, select Back to Viewing All Data in the top right to return to the main data table.

Example: Creating a new entity

In the example below, the user chooses to review the source records for the Alia Gutierrez entity, and decides that one of the records represents a different person, Alicia Gutierrez. After checking that there is no existing "Alicia Gutierrez" cluster that the records could be moved into, the user selects the record and chooses the Create New Entity action. Note that the moved record is marked as pending new.

Reverting Cluster Changes

You can revert any changes made to source record clusters before the next flow is run. Reverting a change returns the source record to the cluster to which it was assigned in the previous flow run. For example, if you move a source record from cluster A to cluster B, and then move it to cluster C, revert will return it to cluster A.

When you revert a change, the reverted source record is marked as pending verification in the original cluster. See Verifying Source Records.

Revert changes from the Manage Clusters Details page.

To revert a cluster change:

  1. On the Manage Cluster Details page, select the checkbox for the record pending change.
  2. Select Actions > Revert Record Moves.

You can also move records multiple times between flow runs; instead of reverting the change, you can simply move the record to its original cluster or a different cluster.

Example: Reverting a moved source record

In the example below, the user chooses to revert a change to move a Juliana Olie source record from one cluster to another. Note that after the user selects to revert the change, the record is marked as pending verification in its original cluster in the next flow run.

Viewing Cluster Override History

You can track cluster override actions in the activity log.