Requirements for Source Data

Before you add a source dataset to Tamr Cloud, ensure that it meets these requirements.

The following table provides general requirements for source datasets. In addition to these requirements, data product templates have specific source dataset requirements. See Data Product Templates.

See Troubleshooting and Known Issues for the latest known issues and limitations for source datasets.

CategoryRequirements
Dataset/File NameEach source dataset must have a unique name.
Primary KeyEach source dataset must have a unique primary key field. See the About Primary Keys section below for more information.
File Extension (uploaded source files)- CSV (.csv)
- TXT (.txt)
- TSV (.tsv)
- PSV (.psv)
File Size (uploaded source files)500MB maximum
Delimiters (uploaded source files)- Comma
- Tab
- Space
- Pipe
Row Separators- Newline
- Carriage return followed by a newline
Header Column Names- Maximum length is 300 characters.
- Column names must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore.
- Column names must be unique (case-sensitive).
- Column names cannot include spaces. Remove any trailing spaces before and after column names, and remove any spaces within the name.
- Quoted values are allowed (for uploaded files).
- Column names cannot include any of the following prefixes:
_TABLE_
_FILE_
_PARTITION_
_ROW_TIMESTAMP_
__ROOT__
_COLIDENTIFIER_
Data Fields- All values must be string values.
- Format: UTF-8, Windows 1252
- Double quoted values are allowed.
- Data in each row must map to the header columns.

About Primary Keys

Primary keys must be unique across all datasets. If a dataset does not contain a unique primary key column, create one by adding a column that contains the filename followed by row number (filename_rownumber).

important Important: If you create your own primary key, you must preserve this key if you update the source files in order to maintain the persistent IDs assigned by Tamr Cloud.