Modifying Transformations for Your Data

Transformations allow you to change the data in a unified dataset without having to modify individual input datasets. For example, you can use transformations to concatenate address or name fields, remove whitespace or delimiters, apply consistent formatting to dates, and so on.

You only need to modify Transformations steps if you added new output fields in the Align to Customer Model (Schema Mapping) step.

Tamr Cloud supports scripted transformations for a single input dataset, resulting in a single output dataset.

See the Tamr Core transformations documentation for transformation function details.

Modifying Transformations to Prepare Data for Enrichment

For each new output field you added, add a line in the transformations to prepare data for enrichment to handle null values.

If you deleted any optional output fields, see Requirements for Input Datasets for necessary changes to transformations, specific to the entity type template.

To update transformations to prepare data for enrichment:

  1. In Designer, select the type type tile to open the flow.
  2. Select the Prepare Data for Enrichment step.
  3. For each new output field, add a line to handle the null case under // Replace empty strings in <field> with true nulls:
use input;
select *,
    // Replace empty strings in <field> with true nulls
    case when is_not_empty(<field>) then <field> else null end as <field>;
  1. When you have finished the transformation updates, navigate back to the flow by selecting the back arrow next to the step description.

Modifying Transformations for Record Consolidation

For each new output field you added, add a line in the transformations to tell Tamr what value to set for that field when creating the mastered entity record.

If you deleted any optional output fields, see Requirements for Input Datasets for necessary changes to transformations, specific to the entity type template.

Depending on the field type, add one of these transformation rules:

  • To set the value to the most common value across input records:
    mode(<field>) as <field>
  • To set the value to the sum of the values across input records:
    sum(to_double(<field>)) as <field>

To update transformations to consolidate records:

  1. In Designer, select the type type tile to open the flow.
  2. Select the Consolidate Records step to open it.
  3. For each new output field, add one of the following two rules under //Groupby :
    • mode(<field>) as <field> to select the most common value for this field.
    • sum(to_double(<field>)) as <field> to calculate a total value for this field.
  4. When you have finished the transformation updates, navigate back to the flow by selecting the back arrow next to the step description.