Connect to your organization's cloud storage locations or databases in order to:
- Import source data for use in your data products.
- Publish data product data to external locations for use by downstream systems.
Your ability to add, use, and edit connections depends on your user role and permissions.
If you have permission to use a connection and modify a flow, you can add datasets stored in the connection's cloud storage location to your mastering flow.
If you have permission to use a connection and publish data, you can select the connection when configuring a data product's publish destination.
This diagram illustrates how connections allow you to securely add and publish data between Tamr Cloud and your organization's cloud storage systems. Below, learn more about each part of the process in detail.
The source data for your data products might be stored in your organization's applications, databases, and/or file systems.
Before adding source data to Tamr Cloud, perform any Extract, Transform, or Load (ETL) steps needed to ensure that the source datasets meet Tamr's requirements. Move or copy that data into a staging zone.
A staging zone is a cloud storage system and the bucket or database within that system that you have connected to Tamr Cloud. Your organization fully manages the staging zone, and allows Tamr Cloud to import data from that system through a configured, secure connection.
Once your source data is available in the connected staging zone, you can add, or import, those datasets to Tamr Cloud. You can think of the Sources table in Tamr Cloud as the landing zone for your data.
When your source datasets are available in Tamr Cloud, you can:
- Add them to your data product.
- Run the flow to clean and enrich your data, deduplicate your source data, and create a mastered entity representing each source record cluster.
- Review and curate your mastering results.
- Rerun the flow to apply any data curation changes.
After reviewing your data product, you can publish, or export, the data product datasets from Tamr Cloud (the publish zone) to your organizations delivery zone, described in the next step.
Once you confirm that your connections and flow are working as expected, you can schedule source updates, flow runs, and publishing jobs.
The delivery zone is a cloud storage system and the bucket or database within that system that you have connected to Tamr Cloud; this is fully managed by your organization. Tamr Cloud can write the published, or exported, datasets to this location.
Move or copy the data product datasets to a location that can be accessed by your downstream systems, such as analytics tools.
Perform any Extract, Transform, or Load (ETL) steps needed to ensure that the datasets meet the requirements of these systems.
Use the exported data product datasets in your organization's analytics tools, workflow management tools, and so on.
See the following topics:
Updated 11 days ago