Curating Entity Type Data

As part of the entity type mastering flow, Tamr Cloud groups records that refer to the same entity into a cluster, using a trained model. Cluster size can range from one record to thousands of records.

Tamr Cloud also applies rules that produce a single entity that best represents each cluster. These rules determine the most appropriate value for each entity field.

If you have curator (or higher) permissions for an entity type, you can review and, if necessary, override the Tamr Cloud-computed clusters and field values for that entity type.

Curating Data

Overriding Field Values

When you override a field value, the field value updates in Studio automatically. This value is persistent; it is not overwritten when mastering flows deliver updated data to Studio. If the mastering flow is re-run, this value persists in Studio, Curator, and published datasets.
See Editing Field Values.

Overriding Source Record Clusters

When you override source record clusters, by merging entities or by moving source records between entities, these changes are applied by the Clustering step in Designer the next time the flow is run. These changes then persist on subsequent mastering flow runs. See Managing Entity Record Clusters.

Metrics in Curator

Entity Type Metrics

The following metrics are available in Curator for each entity type, based on the last mastering flow run:

  • Source Datasets: The number of source (input) datasets.
  • Source Records: The total number of records from all source datasets.
  • Entities: The number of entities resulting from data mastering.
  • % Duplicates: The percentage of records in source datasets that are part of a multi-record cluster.

To view entity type metrics: Select an entity type tile in Curator to open the entity type. Metrics display at the top of the Entities and Fields tabs.

14341434

Entity type metrics in Curator

Entity Metrics

For each entity, you can view, sort, and filter by:

  • Source Records: The number of source records in the cluster for that entity.
  • Source Datasets: The number of source datasets from which the clustered records originated.
  • Similar Entities: The number of similar entities.
  • Has Value Overrides: Whether the entity has any field value overrides (yes/no).

To view entity metrics: Select an entity type tile in Curator to open the entity type. On the Entities tab, entity metrics are listed as columns in the Entities table.

Field Metrics

For each entity type field, you can view the percentage of source records that are complete, meaning that they have non-null values.

To view field metrics: Select an entity type tile in Curator to open the entity type. Select the Fields tab.