Publishing from the Tamr RealTime Datastore

Records in the SOR are assigned a persistent identifier called the record ID (rec_id). Source record and Tamr ID updates and merges are tracked in the record history for every record in the SOR.

You can publish the the following datasets from the SOR to a configured S3, ADLS2, GCS, BigQuery or Snowflake publish destination:

  • SOR records
  • Tamr ID to record ID mappings
  • Relationships

Depending on the destination, these datasets are published as either tables or multi-part UTF-8 encoded CSV files. CSV files are named part-<ID>.csv.

Configuring System of Record Publish Destinations

  1. Navigate to Configurations > Destinations, select New Destination.
  2. Depending on the dataset you want to publish, choose one of the following destination types:
    1. RealTime Records
    2. RealTime Tamr ID Mapping
    3. RealTime Relationships
  3. Name the destination, select the RealTime table, and select the connection to use to publish the dataset. In the Address field, enter the table or file path to which to publish.
  4. Select Create Destination.

The new destination appears on the Destinations page. You can publish the dataset by choosing the Play icon or by using the Jobs API (see Jobs Overview).

If you are running the job via the API, this job requires the destinationId for the destination, which is available on the Configurations > Destinations page.

Published Dataset Schemas

Schema for SOR Records

This output include the following columns:

  • recordId
  • tableId
  • versionId
  • data
  • createdMs (time the record was created in milliseconds since unix epoch)
  • updatedMs (time the record was updated in milliseconds since unix epoch)

Schema for Tamr ID to Record ID Mappings

The SOR mapping table contains the Tamr ID from the data product outputs and its associated rec_id in the SOR.

Additionally, the mapping table contains the following fields:

  • tamrID
  • tableId
  • recordId
  • versionId
  • createdMs (time the record was created in milliseconds since unix epoch)
  • updatedMs (time the record was updated in milliseconds since unix epoch)

Below is a sample of the mapping table output:


Schema for Relationships

This output include the following columns:

  • relationshipId
  • fromTableId
  • fromRecordId
  • toTableId
  • toRecordId
  • relationshipTypeId
  • versionId
  • relationshipDetails
  • createdMs (time the record was created in milliseconds since unix epoch)
  • updatedMs (time the record was updated in milliseconds since unix epoch)