Requirements for Input Datasets

The following table provides general requirements for input datasets. In addition to these requirements, entity type temples have specific input dataset requirements. See Tamr Cloud Templates.

See Troubleshooting and Known Issues for the latest known issues and limitations for input datasets.

CategoryRequirements
Dataset/File NameEach input dataset must have a unique name.
Primary KeyEach input dataset must have a unique primary key field. See the About Primary Keys section below for more information.
File Extension (uploaded source files)- CSV (non-BOM)
- TSV (non-BOM)
File Size (uploaded source files)100MB maximum
Delimiters (uploaded source files)- Comma
- Tab
- Space
- Pipe
Header Fields- Maximum length is 300 characters.
- Field names must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore.
- Field names must be unique (case-insensitive). For example, Column1 and column1 are duplicate names.
- Field names cannot include spaces. Remove any trailing spaces before and after column names, and remove any spaces within the name.
- Quoted values are allowed (for uploaded files).
- Field names cannot include any of the following prefixes:
_TABLE_
_FILE_
_PARTITION
_ROW_TIMESTAMP
__ROOT__
_COLIDENTIFIER
Data Fields- All fields must be string values.
- Format: UTF-8, Windows 1252
- Double quoted values are allowed.
- Data in each row must map to the header fields.

About Primary Keys

Primary keys must be unique across all datasets. If a dataset does not contain a unique primary key column, create one by adding a column that contains the filename followed by row number (filename_rownumber).

importantimportant Important: If you create your own primary key, you must preserve this key if you update the source files in order to maintain the persistent IDs assigned by Tamr Cloud.