Modifying Source Record Clusters

You can override Tamr Cloud-computed clusters.

When you run a mastering flow, Tamr Cloud groups records that refer to the same entity into a cluster, using a trained model. Each entity can represent a cluster of one to thousands of records.

You can review the entities and their clustered source records for data products to which you have access. If you determine that several entities should be combined into a single entity, you can easily merge them. You can also perform more advanced entity source record management. For each entity, you can review similar entities and move individual source records between them. You can also create entirely new entities from the source records of the primary entity which you are viewing.

When you open the Manage Cluster Details page, you can see the details for the entity you are viewing by selecting Show mastered entity details in the top right. Or, you can Hide mastered entity details.

When you merge entities, move records between entities, or create new entities, your changes are applied by the Clustering step the next time the flow is run. First, source records are clustered using the model, and then the overrides are applied. Because overrides are applied after clustering, they do not impact how other source records are clustered. Your changes are shown as pending until the flow is run, and then persist on future mastering flow runs.

When you change the source records for an entity, note that different source record values may be selected for entity fields when the flow is run.

Merging Entities

If you know that two entities refer to the same real-word entity, you can quickly merge them. When you merge two or more entities, you select the entity into which the others will be merged. The Tamr ID for the selected merged entity remains the same, and all source records from the other merged entities move to this entity. The entities merged into this entity are removed, along with their Tamr IDs.

To merge entities:

  1. Select the checkboxes for two or more entities to merge.
    You can sort and filter the table to find specific entities.
  2. From the dropdown, choose Actions > Merge.
  3. In the confirmation dialog, review the list of entities being merged and choose the entity to merge into. Then select Next.
  4. Select Merge.

Changes are shown as pending record pending move until the flow is run.

Moving Source Records between Entities

If you need to compare the source records of several entities to determine whether they have been clustered into the correct entity, you can use the Move option to review the source records and move them between entities. When you open the Manage Cluster Details page, you can see the details for the entity you are viewing by selecting Show mastered entity details in the top right. Or, you can Hide mastered entity details. Other selected entities are listed in the collapsible Filter Source Records by Entity panel on the left, where you can add additional entities for comparison. Your changes will be pending until the next time the flow is run.

important Important: If you move all source records out of an entity, that entity, along with its Tamr ID, will be removed the next time the flow is run.

To move source records between clusters:

  1. Open your data product.
  2. In the Entities tab, select the rows of the entities whose source records you want to review, then select Actions > Move. Alternatively, you can select a entity to open it, and then select Manage Cluster Details.
  3. In the Filter Source Records by Entity panel, select Add to add more clusters and their source records for comparison. Choose Remove to remove clusters from view.
  4. To move records, select one or more record, and either:
    1. Drag and drop the selected records to its correct entity. Confirm the move.
    2. Select records, then choose Actions > Move. Confirm records to move into the chosen entity.
  5. If you need to undo your changes, select Reset. When you have completed all of your changes, select Save to save your changes. Changes are shown as pending record pending move until the flow is run.

Note: In the Filter Source Records by Entity panel, select set as primary next to the entity to switch the primary entity. Select launch to open the Entity Details page for this entity in a new tab.

In the top right corner, select Back to Viewing All Data to return to the main data table.

Creating a New Entity

You can choose to create a new entity from one or more source records from the primary entity which you are viewing, moving those records from the current entity to a new entity. This feature is useful for situations in which clustered source records represent neither the entity to which they currently belong nor to any other existing entity, but rather represent a new entity. Your changes will be pending until the next time the flow is run.

To create a new entity:

  1. Open your data product.
  2. In the Entities tab, select source records to review, then navigate to the Manage Cluster Details tab.
    You can sort and filter the table to find specific entities.
  3. Select one or more source records from the primary entity to move to a new entity.
  4. Select Actions > Create New Entity.
  5. Confirm the source records you want to move into a new entity, then select Create.
    These records are shown in pending new record pending new entity state.
  6. When you have completed all of your changes, select Close to save your changes.
  7. Navigate back to the entities table. Changes are shown in pending state until the flow is run.

You must run the flow to update your changes. Until you run the flow, your changes are in a pending state.

Note: After confirming that you want to create the new entity, you cannot use the Reset option to undo this change. If you do need to revert this change, you can merge the new entity with its previous entity after running the flow.

In the top right corner, select Back to Viewing All Data to return to the main data table.

Viewing Cluster Override History

After you run the flow to apply cluster overrides, you can view both the cluster assigned to a source record by the clustering model and the cluster to which the record was moved by an override in the following places:

  • Apply Clustering step output.
  • Source record tables.
  • Source Records by Entity exported dataset.

The cluster assigned by the clustering model is stored in the suggestedClusterId field.
The cluster to which a record has been moved through overrides is stored in the verifiedClusterId field.

See About Persistent Identifier Fields for more information on these fields.

Confirming Cluster Overrides

After running a flow, you can confirm that your cluster overrides were applied by checking several field values.

  1. Open the entity containing the source records that you applied changes to.
  2. Open the Source Records tab.
  3. Scroll across the source record table until you reach the persistentId, suggestedClusterId, ruleClusterId, verifiedClusterId, and verificationType columns.
  4. Reference the cases below to confirm that your intended override was applied.

Case 1: No Overrides Applied

If no cluster overrides were applied, you see the following:

  • persistentId, suggestedClusterId, ruleClusterId have the same value
  • verifiedClusterId and verificationType values are both null

Case 2: New Entity Created

If you created a new entity from clustered source records, you see the following in the source records for the new entity:

  • persistentId matches the verifiedClusterId
    This is the Tamr ID for the newly created entity.
  • suggestedClusterId matches the ruleClusterId
    The value for these columns is the original cluster for the source records.
  • verificationType is SUGGEST
    SUGGEST indicates that an override was made.

Case 3: Source Records Moved to Different Cluster

If you moved source records to a different cluster, you see the following in the source records that were moved:

  • persistentId and verifiedClusterId are the same
    This is the Tamr ID of the entity to which the source records have been moved.
  • persistentId and suggestedClusterId are different
    The value for these columns is the original cluster for the source records.
  • verificationType is SUGGEST
    SUGGEST indicates that an override was made.