Importing Source Data from an External Data Repository

You can add new source datasets from cloud storage locations or database connections for use in mastering flows.

Adding a Source from Cloud Storage Locations

You can add source dataset files from cloud storage locations, such as AWS S3 or Google Cloud Storage. Before you begin, see Requirements for Source Datasets.

  1. Navigate to Admin > Sources.
  2. In the top right, select Add Source.
  3. Enter a Source Name and optional description, then select Next.
  4. Select a connection type, then select Next.
  5. Choose an existing connection, or add a new one, then select Next.
  6. Choose a source to upload, then select Next.
  7. Specify the following settings for the file. As you change these settings, the data preview updates.
    • Delimiter: (required) Tamr Cloud supports comma, tab, semicolon, space, and pipe delimiters.
    • Header row: Specify whether the file contains a header row.
    • Header Row Number: If the file contains a header row, enter the row number for the header row.
    • Quote Character: Specify whether field values include quote characters. If you set this to yes, Tamr Cloud considers strings within quotes as a single value; delimiter characters within quotes are ignored.
      Example: A comma-delimited file contains a field value "Product, Development, Marketing".
      - If Quote Character is set to yes, the field value in the uploaded file is Product, Development, Marketing.
      - If Quote Character is set to no, this value will produce three different fields. The value of field_1 is "Product. The value of field_2 is Development, and the value of field_3 is Marketing".
  8. Save your settings.

You can now use the file as source data in a mastering flow and share the file with other users.

Watch the video below to learn how to upload and manage source files.

Adding a Source from a Database Connection

You can add source datasets from connected databases, such as Snowflake. Before you begin, see Requirements for Source Data.

  1. Navigate to Admin > Sources.
  2. In the top right, select Add Source.
  3. Enter a Source Name and optional description, then select Next.
  4. Configure column names and data types, then select Next.
  5. View Data Preview.
  6. Select Save to add data source.

You can now use the file as source data in a mastering flow and share it with other users.

Watch the video below to learn how to add and manage source datasets from connected databases.