Uber's Journey Toward Better Data Culture From First Principles

Tags:: 🗞️Articles, Data Quality, Data culture
Author:: Krishna Puttaswamy, and Suresh Srinivas
Link:: Uber’s Journey Toward Better Data Culture From First Principles | Uber Engineering Blog
Source date:: 2021-03-16
Finished date:: 2021-04-24

Data Quality Checks from Uber:

Freshness: time delay between production of data and when the data is 99.9% complete in the destination system including a watermark for completeness (default set to 3 9s), as simply optimizing for freshness without considering completeness leads to poor quality decisions.

Completeness: % of rows in the destination system compared to the # of rows in the source system.

Duplication: % of rows that have duplicate primary or unique keys, defaulting to 0% duplicate in raw data tables, while allowing for a small % of duplication in modeled tables.

Cross-data-center consistency: % of data loss when a copy of a dataset in the current datacenter is compared to the copy in another datacenter.

Semantic checks: captures critical properties of fields in the data such as null/not-null, uniqueness, # of distinct values, and range of values.

Dr. Mario's 2nd 🧠

Explorer

Uber's Journey Toward Better Data Culture From First Principles - Uber Engineering Blog

Graph View