Data Mesh Governance / Policies / Discoverability

Tagging Tables as Data Products

Category: Discoverability
Platform: Databricks

Context

We use Databricks as data platform (TODO Link to architecture decision record).
We use Databricks Unity Catalog as data catalog (TODO Link to architecture decision record).

Unity Catalog is a metastore and includes all schemas and tables of tables, that are both team-internal (like bronze and silver tables) and data exposed for others as data-products.

How do we identify data products in the Unity Catalog?

Options

A: Separate Catalog for data products

A separate catalog “data products”: Data products are registered in a separate catalog, called “data products”. Each team has its own schema. Each table represents a data product. Tables are managed as external tables.

B: Separate Schema for data products

Each team has its own catalog. Each catalog has a schema with the name DATA_PRODUCTS. All tables in this schema are considered as a data products. A table can be a managed table, external table or view.

C: Tagging

Tables that represent a data product are tagged with a table property data_product_name stating the name of the data product. This implies that the table is a data product.

D: Prefix

Data products are prefixed with DP_.

E: Access Control

Every table is a potential data product. We use access control to provide access for other teams.

Decision

We use a table property to identify data products.

Each data product must have these tags sets:

property_key property_value
data_product_name (The readable name of the data product)
data_product_domain (The name of the domain)
data_product_team (The name of the team)

Consequences

Automation

CREATE TABLE INVENTORY_HISTORY(sku string, quantity int, updated TIMESTAMP)
LOCATION 'abfss://container@storageaccount.dfs.core.windows.net/fulfillment/inventory_history';
TBLPROPERTIES(
'data_product_name' = 'inventory_history',
'data_product_domain' = 'fulfillment',
'data_product_team' = 'FUFI'
)
;