Healthcare Providers Mastering

Use the Healthcare Providers template to master healthcare provider and practitioner data, using an industry-standard schema and trained machine learning model. The template provides address enrichment, helping to ensure that you have the most complete and up-to-date information for each provider. You can use this information, for example, to help streamline operations and provide optimal patient care.

Input Dataset Requirements

As part of the mastering flow, Tamr Cloud aligns your input datasets to a unified schema with predefined output fields. Tamr Cloud uses these predefined output fields to enrich your data and consolidate similar records into entities.

In addition to the general Requirements for Input Datasets, certain data is required for Healthcare Providers mastering.

You must map one or more input fields to each of the predefined output fields:

  • unique_key
    This is the primary key for the dataset. See About Primary Keys for more information.
  • address_line_1
  • address_line_2
  • city
  • country
  • credentials
  • first_name
  • last_name
  • middle_name
  • name_suffix
  • gender
  • organization_name
  • provider_speciality
  • phone_number
  • postal_code
  • region
  • trusted_id (for example: National Provider Identifier (NPI))
    The clustering model considers records with the same value for this field to be a match. Map a field to this output field only if records that have the same value for this field should be considered a match. If the values in the input do not represent a definite match, map a placeholder field to trusted_id.

Clustering Model

For Healthcare Provider mastering, the clustering model considers the similarity of values for the following fields, and then uses decision-tree logic to accurately identify records that refer to the same entity:

  • Full address
  • Street address
  • City
  • Region
  • Country code
  • Postal code
  • First three digits of the postal code
  • Phone Number
  • First name
  • Last name
  • Middle name
  • Name suffix
  • Organization name
  • Speciality
  • Credentials

Except in rare edge cases, records with the same trusted ID are clustered together, while records with different trusted IDs are not clustered together.

Tamr Cloud looks for similarities in these fields, not exact matches. For example, two addresses on the same street can correspond to the same provider.

Modifying the Mastering Flow

Designer Flow Step Overview

When you create an entity type using the B2B Site Mastering template, Tamr Cloud creates a mastering flow in Designer with steps specific to site mastering:

Add Data

Ensure your input data meets both the general requirements for input datasets and the specific requirements for this template.

Then, see Adding a Dataset.

Align to Healthcare Provider Data Model

See Mapping Input Fields to a Unified Schema

Prepare Data for Enrichment

This step transforms the data in the unified dataset to match the expected inputs to the enrichers included in the mastering flow.

If you added a new field in the Schema Mapping step, add a line in the transformations to prepare data for enrichment to handle null values. See Modifying Transformations for Your Data.

Enrich Address

This step standardizes and enriches address data. You do not need to make changes to this step.

See Address Enrichment for information about this enricher, including the output fields it adds to your data.

Prepare for Clustering

This step transforms the data in the unified dataset to create the fields used by the trained clustering model to identify similar and matching records. The fields created as input to the model are prefixed with ml_. Many of these ml_ fields are created as arrays of unified source fields and fields added by the enrichment services. The model identifies the most similar values across the arrays and assigns weights based on these similarities.

Adding Provider Specialty Code Mappings

Because the cluster model uses full text values for specialty, some of the transformations in this step map input provider specialty codes to full text, as shown in the code sample below. If needed, you can add more mappings, following the pattern in the transformation code.

//provider_specialty as ml_provider_specialty;

select *,
case
 when provider_specialty == 'NRP' then 'Nurse Practitioner'
 when provider_specialty == 'FM' then 'Family Medicine'
 when provider_specialty == 'IM' then 'Internal Medicine'
 when provider_specialty == 'PHA' then 'Physician Assistant'
 when provider_specialty == 'OBG' then 'Obstetrics / Gynecology'
 when provider_specialty == 'EM' then 'Emergency Medicine'
 when provider_specialty == 'END' then 'Endocrinology, Diabetes and Metabolism'
 when provider_specialty == 'RHU' then 'Rheumatology'
 when provider_specialty == 'ID' then 'Infectious Disease'
 when provider_specialty == 'PD' then 'Pediatrics'
 when provider_specialty == 'ON' then 'Medical Oncology'
 when provider_specialty == 'ORS' then 'Orthopedic Surgery'
 when provider_specialty == 'CD' then 'Cardiovascular Disease'
 when provider_specialty == 'N' then 'Neurology'
 when provider_specialty == 'P' then 'Psychiatry'
 when provider_specialty == 'HO' then 'Hematology/Oncology'
 when provider_specialty == 'GP' then 'General Practice'
 when provider_specialty == 'CHP' then 'Child and Adolescent Psychiatry'
 when provider_specialty == 'GS' then 'General Surgery'
 when provider_specialty == 'DR' then 'Diagnostic Radiology'
 when provider_specialty == 'U' then 'Urology'
 when provider_specialty == 'D' then 'Dermatology'
 when provider_specialty == 'PUD' then 'Pulmonary Disease'
 when provider_specialty == 'GE' then 'Gastroenterology'
 when provider_specialty == 'PTH' then 'Anatomic/Clinical Pathology'
 when provider_specialty == 'NS' then 'Neurological Surgery'
 when provider_specialty == 'TS' then 'Thoracic Surgery'
 when provider_specialty == 'NEP' then 'Nephrology'
 when provider_specialty == 'OM' then 'Occupational Medicine'
 when provider_specialty == 'GPM' then 'General Preventive Medicine'
 when provider_specialty == 'MPD' then 'Internal Medicine/Pediatrics'
 when provider_specialty == 'HEM' then 'Hematology'
 when provider_specialty == 'PHO' then 'Pediatric Hematology/Oncology'
 when provider_specialty == 'AN' then 'Anesthesiology'
 when provider_specialty == 'PM' then 'Physical Medicine and Rehabilitation'

You do not need to make changes to this step.

Apply Clustering Model

This read-only step groups records that refer to the same entity into a cluster, using the trained model.

Note: You can publish the output of this step by publishing the "Source Records by Entity" dataset. See Available Published Datasets.

Consolidate Records

This step applies rules to produce a single entity record that best represents a cluster. For most fields, these rules select the most common value from the clustered records.

Additionally, this step adds a Tamr ID (tamr_id) to each entity. The Tamr ID is a unique, persistent id.

If you added a new field in the Schema Mapping step, add a line in the transformations to tell Tamr what value to set for that field when creating the mastered entity record. See Modifying Transformations for Your Data.

Deliver to Studio

This step allows you to configure how entity data appears in Studio, Curator, and published datasets. See Configuring Data Display in Studio.