Data Mesh Governance / Policies / Discoverability
We use Databricks as data platform (TODO Link to architecture decision record).
We use Databricks Unity Catalog as data catalog (TODO Link to architecture decision record).
Unity Catalog is a metastore and includes all schemas and tables of tables, that are both team-internal (like bronze and silver tables) and data exposed for others as data-products.
How do we identify data products in the Unity Catalog?
A: Separate Catalog for data products
A separate catalog “data products”: Data products are registered in a separate catalog, called “data products”. Each team has its own schema. Each table represents a data product. Tables are managed as external tables.
B: Separate Schema for data products
Each team has its own catalog. Each catalog has a schema with the name
DATA_PRODUCTS. All tables in this schema are considered as a data products. A table can be a managed table, external table or view.
Tables that represent a data product are tagged with a table property
data_product_name stating the name of the data product. This implies that the table is a data product.
Data products are prefixed with
E: Access Control
Every table is a potential data product. We use access control to provide access for other teams.
We use a table property to identify data products.
Each data product must have these tags sets:
|data_product_name||(The readable name of the data product)|
|data_product_domain||(The name of the domain)|
|data_product_team||(The name of the team)|
data_product_namein the Unity catalog to identify data products.
CREATE TABLE INVENTORY_HISTORY(sku string, quantity int, updated TIMESTAMP) LOCATION 'abfss://firstname.lastname@example.org/fulfillment/inventory_history'; TBLPROPERTIES( 'data_product_name' = 'inventory_history', 'data_product_domain' = 'fulfillment', 'data_product_team' = 'FUFI' ) ;