Publishing Data Product Data

You make different data product datasets available to downstream destinations by publishing them to a cloud storage destination.

Your ability to publish datasets depends on your user role and permissions.

The datasets that you can publish for a data product include Mastered Entities, Source Records by Cluster, and Cluster by Similarity. When configuring a publish destination, you select which datasets to publish to that location and which columns to include in those datasets. See Datasets Available for Export for information on these datasets.

When you publish data to cloud storage, any data already published to the destination for the data product is overwritten.

Before You Begin:
Because publishing overwrites any data already published to the destination, back up the target file or table before publishing.

important Important Notes for Snowflake:

  • If you are publishing to Snowflake and have added or removed output fields since the last publish job, you must either update the destination Snowflake table to match the updated schema or delete the destination table; otherwise, publishing will fail. If you delete the destination table before exporting, the table is recreated with the updated schema when published.
  • In order to view the published dataset, you must have read access in Snowflake to the table to which it was published. Contact your Snowflake administrator if you are not able to view the published datasets.

To publish data product datasets:

  1. Open the data product from the home page.
  2. Select the Publish page.
  3. If the Publish table does not include the destination that you want to use, add a new destination. See Adding a Publish Destination for instructions.
  4. In the table, select Publish Publish for the destination to which to publish the data and confirm.
    The datasets configured for that destination are published to the cloud storage location.

You can monitor the progress of the publish job.