The following table provides general requirements for input datasets. In addition to these requirements, entity type temples have specific input dataset requirements. See Tamr Cloud Templates.
See Troubleshooting and Known Issues for the latest known issues and limitations for input datasets.
|Dataset/File Name||Each input dataset must have a unique name.|
|Primary Key||Each input dataset must have a unique primary key field. See the About Primary Keys section below for more information.|
|File Extension (uploaded source files)||- CSV (non-BOM)|
- TSV (non-BOM)
|File Size (uploaded source files)||100MB maximum|
|Delimiters (uploaded source files)||- Comma|
|Header Fields||- Maximum length is 300 characters.|
- Field names must contain only letters (a-z, A-Z), numbers (0-9), or underscores (
- Field names must be unique (case-insensitive). For example,
- Field names cannot include spaces. Remove any trailing spaces before and after column names, and remove any spaces within the name.
- Quoted values are allowed (for uploaded files).
- Field names cannot include any of the following prefixes:
|Data Fields||- All fields must be |
- Format: UTF-8, Windows 1252
- Double quoted values are allowed.
- Data in each row must map to the header fields.
Primary keys must be unique across all datasets. If a dataset does not contain a unique primary key column, create one by adding a column that contains the filename followed by row number (
Important: If you create your own primary key, you must preserve this key if you update the source files in order to maintain the persistent IDs assigned by Tamr Cloud.
Updated 2 months ago