Data Mesh Governance / Policies / Privacy & Compliance

Data Classification

Category: Privacy

Context

Managing and securing personal data, Personally Identifiable Information (PII), and business secrets is critical and subject to multiple legal and compliance requirements. Violations or leaks can result in serious penalties or harm for the business.

A first step is to define data classes and their sensitivity.

Decision

Data Classes

We define data classes around sensitivity levels:

Classification Data Classes Access Control
sensitive PII, Personal Data, Public Health Information No access for analytical use.
May be made available as restricted or internal after applying de-identification methods such as aggregation, masking, or differential privacy.
restricted Financial data, contracts, customer communication Access upon request for specific analytical use cases
internal Business transactions, master data Access for everyone in the organization
public Public available data, external Access for everyone in the organization

By default, we consider all unclassified data to be sensitive.

Data Classification

Each info type is assigned to a data class and classified:

Info Type Data Class Classification
first name PII sensitive
last name PII sensitive
home address PII sensitive
email address PII sensitive
telephone number PII sensitive
passport number PII sensitive
social security number PII sensitive
photo of face PII sensitive
credit card number PII sensitive
account user name PII sensitive
financial records PII sensitive
medical records PII sensitive
fine-grained geolocation PII sensitive
IP address PII sensitive
cookie IDs PII sensitive
device fingerprint PII sensitive
MAC address PII sensitive
IMEI PII sensitive
support tickets Customer communication confidential
Net Promoter Score Customer communication (aggregated) internal
contribution margin Business information confidential
account balance Financial data confidential
supplier agreements Contracts confidential
employment contracts Contracts confidential
Salary Contracts confidential
partial address (country, zip code) PII (aggregated) internal
age range PII (aggregated) internal
year of birth PII (aggregated) internal
gender PII (aggregated) internal
industry of employment PII (aggregated) internal
prices Master data internal
search queries Business transactions internal
orders Business transactions internal
product master data Public public
product images Public public
ads Public public
financial statements Public public
weather External public
stock prices External public

Note: This list is an example, not complete, and needs to be adjusted and complemented to the specific context for each organization. Include legal and data privacy experts into the discussion.

Consequences

Automation

The data classes builds the foundation for our Data Catalog taxonomy and are defined as a Terraform module.

The classification of columns can be automated through BigQuery PII Classifier open-source component (see blog post Stop Worrying About BigQuery PII: How to Automate Data Governance at Scale).