Requirements for Source Data
Before you add a source dataset to Tamr Cloud, ensure that it meets these requirements.
The following table provides general requirements for source datasets. In addition to these requirements, data product templates have specific source dataset requirements. See Data Product Templates.
See Troubleshooting and Known Issues for the latest known issues and limitations for source datasets.
|Dataset/File Name||Each source dataset must have a unique name.|
|Primary Key||Each source dataset must have a unique primary key field. See the About Primary Keys section below for more information.|
|File Extension (uploaded source files)||- CSV (.csv)|
- TXT (.txt)
- TSV (.tsv)
- PSV (.psv)
|File Size (uploaded source files)||500MB maximum|
|Delimiters (uploaded source files)||- Comma|
|Row Separators||- Newline|
- Carriage return followed by a newline
|Header Column Names||- Maximum length is 300 characters.|
- Column names must contain only letters (a-z, A-Z), numbers (0-9), or underscores (
- Column names must be unique (case-sensitive).
- Column names cannot include spaces. Remove any trailing spaces before and after column names, and remove any spaces within the name.
- Quoted values are allowed (for uploaded files).
- Column names cannot include any of the following prefixes:
|Data Fields||- All values must be |
- Format: UTF-8, Windows 1252
- Double quoted values are allowed.
- Data in each row must map to the header columns.
About Primary Keys
Primary keys must be unique across all datasets. If a dataset does not contain a unique primary key column, create one by adding a column that contains the filename followed by row number (
Important: If you create your own primary key, you must preserve this key if you update the source files in order to maintain the persistent IDs assigned by Tamr Cloud.
Updated 8 days ago