• Data Quality Checks from Uber:

Freshness: time delay between production of data and when the data is 99.9% complete in the destination system including a watermark for completeness (default set to 3 9s), as simply optimizing for freshness without considering completeness leads to poor quality decisions.

Completeness: % of rows in the destination system compared to the # of rows in the source system.

Duplication: % of rows that have duplicate primary or unique keys, defaulting to 0% duplicate in raw data tables, while allowing for a small % of duplication in modeled tables.

Cross-data-center consistency: % of data loss when a copy of a dataset in the current datacenter is compared to the copy in another datacenter.

Semantic checks: captures critical properties of fields in the data such as null/not-null, uniqueness, # of distinct values, and range of values.