While a mastering flow can contain multiple transformation steps, you typically need to modify only one of these steps: Consolidate Records. The Consolidate Records step applies transformation rules to produce a single record, called the mastered entity record, that best represents a cluster. For most attributes, these rules select the most common value from the clustered source records.
If you added new output attributes in the Schema Mapping step, you also need to add transformation rules to select the appropriate values for each of these attributes in the mastered entity record.
Common transformation rules:
- To set the value to the most common value across source record:
mode(<field>) as <field>
- To set the value to the sum of the values across source records:
sum(to_double(<field>)) as <field>
- To prefer a value from a specific source dataset across source records:
mode(...filter(lambda x: <source_dataset> == preferred_source_for_other_attributes, array(<field>))) as <field>
If you require a more advanced rule to select the appropriate mastered entity record attribute value, see the Tamr Core transformations documentation for complete transformation function details. This link opens the Tamr Core documentation in a new window.
Note: Tamr Cloud supports scripted transformations for a single input dataset, resulting in a single output dataset.
To update transformations to consolidate records:
Open the data product from the home page.
Select the Configure Flow page.
Select the Consolidate Records step to open it.
For each new output attribute, add the transformation rule under
mode(<field>) as <field>to select the most common value for this attribute.
For each new output attribute, ensure that the the data type at the end of record consolidation is
SELECT \*, to_string(<field>) AS <field>
When you have finished the transformation updates, navigate back to the flow by selecting the back arrow next to the step description.
Updated about 1 month ago