Configuring the B2B Customers Data Product
This topic describes how to configure the B2B Customers data product.
Adding Data to Your Data Product
Add input data to a data product using a previously uploaded source data file. To learn how to upload sources to Tamr Cloud, see Managing Source Datasets.
To add datasets, go to the Configure Data Product tab, also known as the Settings page. Select the input dropdown to add your sources. Add sources one at a time by selecting the input then Add Source.
Configuring Attributes
Map attributes from your source records to the attributes in the industry-standard schema for your selected data product. Where possible, Tamr Cloud automaps input attributes to appropriate attributes in the unified schema.
AutoMapping Attributes
The AutoMap feature can help you quickly map source columns to appropriate attributes in the unified schema, by:
- Identifying source columns that match previously mapped source columns. AutoMap applies the same mapping for the matching columns.
- Identifying source columns that match unified schema attributes. AutoMap maps these columns to their matching unified schema attributes.
AutoMap considers columns and attributes to be a match when they contain the same words, not including delimiter characters, plural words, and partial matches. Delimiters recognized include camel case, but not lower case characters. AutoMap does not map two columns from a single source to the same output attribute, and does not create an output attribute if no match is found for a selected column.
Two names are considered a match if the names are an exact match when split on:
- The following characters:
- \ _ ( ) / \
- The boundary between lowercase and uppercase letters.
- The boundary between letters and numbers.
- Whitespace characters.
Example | Resulting Action |
---|---|
Match cases | Match: addressLine1 , address_line_1 , and Address Line (1) These 3 columns would be mapped to the unified schema attribute address_line_1 . |
Delimiters accepted | Match: company_id and company id These 2 columns would be mapped to the unified schema attribute company_id . |
Delimiters not accepted | No match: : address|Line|1 and addressLine1 No mapping. Pipe character delimiters are not accepted. |
Plurals | No match: region and regions No mapping. |
Partials | No match: Primary Street Address and Street Address No mapping. |
Mapping Attributes
To map an input attribute from your source, you select the dropdown in your source’s column, and then choose an appropriate value.
When mapping attributes, there are required, suggested, and optional attributes for you to map:
- Required: You must map source columns to these attributes. You cannot refresh your data until you do this.
- Suggested: For optimal data quality, enrichment, and clustering results, map source columns to these attributes.
- Optional: These attributes have minimal impact on your clustering and enrichment results. If your source data includes columns that match these attributes, map them to include that source data in your completed data product.
For example, in the image below, the primary key attribute needs to be mapped. Additionally, you can see Tamr Cloud automatically mapped an attribute for Company Name. Alternative names is not required, but you can choose to map the attribute.
You can select the icon next to the source name to see a preview of the dataset and help you best determine how to map each attribute.
Record Consolidation Rules
Record consolidation rules create a single record, called the mastered entity record, which best represents a cluster of similar source records. You can configure rules that determine the appropriate values from the cleaned, standardized, and validated source records to include in the mastered entity record.
Select the attribute name to set rules.
You can set attributes to be consolidated to:
- Tamr-recommended value
- Distinct values
- Most common value
- Longest value
By default, attributes are set to be consolidated to the Tamr-recommended value.
Examples of Tamr-recommended values include:
- For address attribute values, Tamr recommends values chosen from the source record with the most valid address in the cluster.
- For the alternative_names attribute value, Tamr recommends the set of all company_names and alternative_names for the records in that cluster, excluding the value that is selected for the mastered entity’s company_name.
When you consolidate an attribute to its distinct values, Tamr Cloud concatenates all the attribute’s unique values into a pipe-delimited string.
Writing Rule Conditions
When you consolidate an attribute to its distinct values, or the most common or longest value, you can add further conditions to exclude records with empty attributes, or exclude, constrain, or prioritize records based on certain attribute values. See the below examples to understand how the following conditions work.
Exclusions
For example, in the image below, Address is set to the most common value, with the condition to exclude any records with empty values for City. This means records with empty values for the specified attribute will be excluded when identifying the most common value from the clustered source records.
Constraints
You could set Company Name to the most common value, when considering only records containing the maximum number in the Founding Year column.
Below, Company Name is set to the most common value, when considering only records with the most common value in the Country column.
Prioritizations
When you prioritize a dataset, this means that when there is a tie for the most common or longest value, the attribute from the prioritized dataset is chosen. In the image below, Postal Code is set to the longest value, with the condition to prioritize records from the source b2b_customers_le_sf_firmographic
and exclude records from the source b2b_customers_le_phone_url
.
Uniformity Score
For each attribute, select whether to calculate the uniformity score. The uniformity score provides insight into how similar the values for this attribute are within the source record cluster. Uniformity scores range from 0 to 1. A uniformity score of 1 for an attribute means that all records in the cluster have the same value for this attribute, while a uniformity score of 0 indicates that all records in this cluster have different values for this attribute.
Note: Tamr automatically calculates the overall uniformity score for each cluster, which indicates how similar clustered source records are to each other.
When you select to calculate the uniformity score of an attribute, a uniformity score icon appears next to the attribute name.
Configuring Enrichment
Select one or more enrichment sources for this data product. See Tamr Firmographic Enrichment for more information on data providers.
Configuring Clustering Rules
You can fine-tune clustering results by applying clustering rules. After the model has clustered source records, these rules can match or split clusters based on values in specific attributes. For example, if you trust that a user’s social security number uniquely identifies a person, you may want to create a rule that always clusters together records with matching social security numbers and does not cluster records with different social security numbers.
You can add up to three rules. See Understanding Clustering for more information on clustering.
There are three types of clustering rules:
- Match: Matches clusters with matching values for a specified attribute, such as a
company_name
. Match rules will not match clusters that contain only null or empty values for the specified attribute. - Split: Splits clusters that contain records with different values for a specific attribute. The rule splits the cluster so that each new cluster contains records with matching values for the attribute.
- Both: For a specified attribute, matches clusters with matching values and splits clusters that contain records with different values.
Each rule is numbered. After the data product runs, the Applied Clustering Rules (clustering_metadata.applied_clustering_rules
) attribute in the source records dataset provides the number assigned to any rules applied to the record.
Configuring Data Cleaning
The data cleaning feature enables you to replace known bad values in your source data with null
. This helps to ensure that these values are not used for matching or included in your golden records.
The values that you specify are replaced with null
wherever they appear in the mapped data product schema. For example, you may want to remove values such as NaN
, empty
, test
, and so on.
Specify a value to be cleaned by selecting Add Value and then entering the exact string to be replaced with null
. You can easily edit or delete values.
After running the data product, the null
replacement values are included in the enhanced source records dataset and in the golden records (with the exception of address fields for which the original values are retained). See Publishing Data Products for more information about enhanced source records.
Running Your Data Product and Viewing Results
To run your data product and apply all enrichment, clustering, and record consolidation rules, scroll to the top of the page and in the top right corner, select Refresh Data. Below the Refresh Data button, you can see the last time your results were refreshed.
To view your results, go to the Entities page, or go to Insights to see key metrics.
Updated about 1 month ago