Running an End-to-End Flow for Legacy Data Products

At a high level, here is what you need to do in order to run a mastering flow in Tamr Cloud.

Connect Data

The first step is connecting your data to Tamr Cloud. To do this, you configure connections to your data storage locations, and then add your source datasets.

Configure Data Product

Next, you can add and then configure your data product. You map columns from your source data to the industry-standard schema for your selected data product. Configure clustering rules and transformations, and then configure how attributes will appear in the mastered data.

Run Flow

Running the flow deduplicates your data by grouping similar records together into clusters, and cleans and enriches it with third-party data. The flow produces the single best record representing each entity (the mastered entity or golden record).

Review & Curate Results

Review your results with Insights. Use review tools such like sorting, filtering, and table views to look over your data. You can verify records that you are sure are clustered correctly, meaning these records will always be clustered together, regardless of changes in the source data. If necessary, manually override any incorrect or incomplete values. Additionally, you can provide feedback on records and clusters and assign reviewer.

Publish Data

Export data to your storage systems to use in downstream applications. When you publish data, any data already published to the destination for the data product is overwritten. See Publishing Data Product Data.

Automate

Automate your flow using jobs. See Jobs Overview and Automating Scheduled Jobs.