Importing Records to the Tamr RealTime Datastore
Limited Release Feature
This feature is in limited release and is subject to change.
You can configure a one-time workflow to import existing golden records into a configured table in the Tamr RealTime System of Record (SOR) datastore.
Importing source records into the SOR involves:
- Ensuring the source file meets the necessary requirements for record create and update.
- Adding the source file in Tamr Cloud.
- Configuring and running the workflow to import the source records into the SOR.
- Once the records are imported, deleting the source and workflow.
Source File Requirements
File Formats
- File formats: NDJSON, Avro, and Parquet
- Encoding: UTF-8
Record Create: Required and Optional Source Columns
| Column | Data Type | Description | Required? |
|---|---|---|---|
data | JSON | The golden record's data. The record data schema is specific to the data product. | Yes |
externalIds | array<string | The list of external ids to link to the golden record. A value may only appear once across all records in the import. | No |
Record Update: Required and Optional Source
| Column | Data Type | Description | Required? |
|---|---|---|---|
recordId | string | The Tamr record id (rec_<id>). If this record id exists, it is updated. If it does not exist, the record is created as a new record. | Yes |
versionId | number | The Tamr record's version id. If provided, Tamr performs a consistency check. If the version id does not match, the record is skipped. | Yes |
data | JSON | The golden record's data. The record data schema is specific to the data product. | Yes |
externalIds | array<string | The list of external ids to link to the golden record. A value may only appear once across all records in the import. | No |
Validating the Schema for Importing Records
The import records file format (NDJSON, Avro, and Parquet) each support schema validation upon read and write. Apply schema validation when creating the import record file.
Contact Tamr Support ([email protected]) for the schema for your data product.
| Validation | Example |
|---|---|
data cannot be an empty dictionary. | The value "data": {} is invalid. |
data cannot contain duplicate column names. | The value "data": {"name": "Tamr", "name": "Tamr Inc."} is invalid. |
externalIds must contain a list of unique value. | The value externalIds: ["1", "1"] is invalid. |
externalIds cannot contain empty string or null. | The values externalIds: ["", "1"] and externalIds: [null, "1"] are invalid. |
externalIds must obey a strict 1:n relationship with records. | The following two records invalidate this relationship because externalId is a child of the two records: {"data": {... }, "externalIds": ["1"]} {"data": {... }, "externalIds": ["2", "1"]}. In this case the 2nd record will be skipped. |
Record ordering is not preserved.
Adding the Source File
In Configurations > Sources, add the source file from a configured connection or direct upload. See Managing Sources for instructions.
Importing Records Using a Workflow
To import records from a source file:
- Navigate to Configurations > Workflows.
- Select New Workflow and choose Import Records into System of Record.
- Configure the workflow:
- Add a name for the workflow.
- Select the source file.
- Select the Tamr RealTime datastore table to which to import the records.
- Choose whether to delete any existing records in the table before import (disabled by default).
- Select Create Workflow.
- On the Workflows page, run the workflow by selecting the Run icon.
Important:
- The workflow job status is marked as complete once all records have been successfully queued for import to the SOR. These records are not immediately available in the SOR.
- Do not re-run the workflow. Re-running recreates the same records with new record id values.
Deleting the Workflow and Source
Once all imported records are available in the SOR, delete the workflow and source in Tamr Cloud to prevent accidental re-running of the workflow.
Updated about 15 hours ago