Modifying Source Record Clusters

You can override Tamr Cloud-computed clusters.

When you run a mastering flow, Tamr Cloud groups records that refer to the same entity into a cluster, using a trained model. Each entity can represent a cluster of one to thousands of records.

You can review the entities and their clustered source records for data products to which you have access. You can:

  • Quickly merge several entities.
  • Review similar entities and move individual source records between them.
  • Create entirely new entities from source records.

When you make any changes to record clusters, your changes are pending until the flow is run, and then persist on future mastering flow runs. After the changes are applied, the source records are marked as verified in the cluster. By default, all records in both the original and destination entity are marked as verified. See Verifying Source Records for more information about record verifications.

You can make multiple changes to a source record before the flow next runs.

When the flow runs, the Apply Clustering step clusters source records using the model, and then applies your overrides. Because overrides are applied after clustering, they do not impact how other source records are clustered.

Note: When you change the source records for an entity, different values may be selected for entity attributes in the next flow run.

Merging Entities

If you know that two or more entities refer to the same real-word entity, you can quickly merge them. When you merge entities, you select the entity into which the others will be merged. The Tamr ID for the surviving entity remains the same, and all source records from the other merged entities move to this entity. The entities merged into the surviving entity are removed, along with their Tamr IDs. This is the same as manually moving every record from one entity into another.

When you merge entities, all source records are automatically verified in the surviving entity.

To merge entities:

  1. Select the checkboxes for two or more entities to merge.
    You can sort and filter the table to find specific entities.
  2. From the dropdown, choose Actions > Merge.
  3. In the confirmation dialog, review the list of entities being merged and choose the entity to merge into. Then select Next.
  4. Select Merge.

Your changes will be applied the next time the flow is run. If you need to revert this change before the flow is run, see Reverting Changes below.

Moving Source Records between Entities

You can compare the source records of several entities to determine whether they have been clustered correctly. You can then move the source records between entities as needed.

When you move source records between entities, all moved records are verified in the new cluster. By default, Tamr also verifies all records in the original and destination entities. You can choose not to verify records in one or both entities when merging records.

important Important: If you move all source records out of an entity, that entity, along with its Tamr ID, will be removed the next time the flow is run.

To move source records between clusters:

  1. Open the data product.
  2. In the Entities table, select the rows of the entities whose source records you want to review, and then select Actions > Manage Cluster Details.
  3. If needed, select additional source record clusters for comparison from the Filter Source Records by Entity panel, or to remove clusters that you no longer want to compare.
    Note: In the Filter Source Records by Entity panel, you can select set as primary next to the entity to switch the primary entity.
  4. To move records, select one or more record, and either:
    1. Drag and drop the selected records to its correct entity.
    2. Select records, then choose Actions > Move.
  5. If necessary, select the entity to which to move the records.
  6. By default, the options to verify all records in the original and destination entity are enabled. Verified records will not move between clusters during flow runs. Disable one or both of these options if you do not want the records to be verified.
  7. Select Merge.

Your changes will be applied the next time the flow is run. If you need to revert the changes before the flow is run, see Reverting Changes below.

Tip: You can also review clusters and move records by selecting records on the Source Records page and choosing Actions > Move Records.

Creating a New Entity

On the Manage Cluster Details page, you can choose to create a new entity from one or more source records. This moves all of the selected records into the the same new entity. For example, if you select three source records, all three are clustered into the same new entity.

When you create a new entity, all records are verified in the new entity. By default, Tamr also verifies all records in the original entity. You can choose not to verify records in the original entity.

To create a new entity:

  1. Open the data product.
  2. In the Entities table, select one or more entities whose source records you want to review, then select Actions > Manage Cluster Details.
    You can sort and filter the table to find specific entities.
  3. Select one or more source records to move to a new entity.
  4. Select Actions > Create New Entity.
  5. Confirm the source records you want to move into a new entity.
  6. By default, the option to verify all records in the original entity are enabled. Verified records will not move between clusters during flow runs. Disable this options if you do not want the records in the original cluster to be verified.
  7. Select Create.

Your changes will be applied the next time the flow is run. If you need to revert the changes before the flow is run, see Reverting Changes below.

Tip: You can also create new entities by selecting records on the Source Records page and choosing Actions > Create Entity.

Reverting Cluster Changes

You can revert any changes made to source record clusters before the next flow is run. Reverting a change returns the source record to the cluster to which it was assigned in the previous flow run. For example, if you move a source record from cluster A to cluster B, and then move it to cluster C, revert will return it to cluster A.

When you revert a change, the reverted source record is marked as pending verification in the original cluster. See Verifying Source Records.

Revert changes from the Manage Clusters Details page.

To revert a cluster change:

  1. On the Manage Cluster Details page, select the checkbox for the record pending change.
  2. Select Actions > Revert Record Moves.

You can also move records multiple times between flow runs; instead of reverting the change, you can simply move the record to its original cluster or a different cluster.

Viewing Cluster Override History

You can track cluster override actions in the activity log.