Connecting to External Data Repositories

You use connections to configure access to cloud storage locations or databases. These locations store source datasets (source connections) or published entity data (destination connections).

Connect to your organization's cloud storage locations or databases in order to:

  • Import source data for use in your data products.
  • Publish data product data to external locations for use by downstream systems.

Permissions for Managing Connections

Your ability to add, use, and edit connections depends on your user role and permissions.

If you have permission to modify a flow, you can add source datasets that have been added from any connection to the flow.

If you have permission to publish data, you can select any existing connection when configuring a data product's publish destination.

About Connecting to Your Data

This diagram illustrates how connections allow you to securely add and publish data between Tamr Cloud and your organization's cloud storage systems. Below, learn more about each part of the process in detail.

1: Preparing Data in Your Organization's Source Systems

The source data for your data products might be stored in your organization's applications, databases, and/or file systems.

Before adding source data to Tamr Cloud, perform any Extract, Transform, or Load (ETL) steps needed to ensure that the source datasets meet Tamr's requirements. Move or copy that data into a staging zone.

A staging zone is a cloud storage system and the bucket or database within that system that you have connected to Tamr Cloud. Your organization fully manages the staging zone, and allows Tamr Cloud to import data from that system through a configured, secure connection.

2: Adding and Publishing Data in Tamr Cloud

Once your source data is available in the connected staging zone, you can add, or import, those datasets to Tamr Cloud. You can think of the Sources table in Tamr Cloud as the landing zone for your data.

When your source datasets are available in Tamr Cloud, you can:

  • Add them to your data product.
  • Run the flow to clean and enrich your data, deduplicate your source data, and create a mastered entity representing each source record cluster.
  • Review and curate your mastering results.
  • Rerun the flow to apply any data curation changes.

After reviewing your data product, you can publish, or export, the data product datasets from Tamr Cloud (the publish zone) to your organizations delivery zone, described in the next step.

Once you confirm that your connections and flow are working as expected, you can schedule source updates, flow runs, and publishing jobs.

3: Using Published Data in Your Organization's Downstream Systems

The delivery zone is a cloud storage system and the bucket or database within that system that you have connected to Tamr Cloud; this is fully managed by your organization. Tamr Cloud can write the published, or exported, datasets to this location.

Move or copy the data product datasets to a location that can be accessed by your downstream systems, such as analytics tools.

Perform any Extract, Transform, or Load (ETL) steps needed to ensure that the datasets meet the requirements of these systems.

Use the exported data product datasets in your organization's analytics tools, workflow management tools, and so on.

Configuring Cloud Connections

See the following topics: