Metadata
- Author: Anna Filippova
- Full Title:: How to Build a Resilient DAG
- Category:: 🗞️Articles
- URL:: https://roundup.getdbt.com/p/build-a-resilient-dag
- Finished date:: 2023-02-06
Highlights
denormalization and other forms of data duplication are used sparingly, and only to maintain necessary historical context (View Highlight)
Event data is always going to be your largest and slowest job. It’s also the type of data that usually needs to land in the warehouse the quickest to power up-to-the-minute insights. Set up incremental models for event data as early as possible (don’t wait for this dataset to get big!), and run these jobs as frequently as you can (View Highlight)
reserve most of your data testing for further downstream. For example, your daily jobs will likely pull together context to enrich your event data, and do what is needed to compute your daily metrics. This is the ideal place to put in place some sort of variance testing (View Highlight)
the fewer times you want to re-run a computation, the more rigorously you should be evaluating your data quality (View Highlight)
New highlights added 2023-08-09
if your cloud warehouse charges you based on the number of rows queried, you can safely make a certain number of days of highly granular event data available for direct querying and put the rest in cold storage (View Highlight)
after the close of a month, we generally want to “finalize” metrics related to the past month’s revenue, product usage, and the state of our product funnel. In these conversations, once a period passes, we reason in months and not days. And unless we got these metrics seriously wrong, we generally don’t need to compute them again and again. (View Highlight)
Similarly, we’d want to lock down the values of important metrics for a past quarter in preparation for a board meeting, and once we reach the end of a fiscal year to allow us to visualize trends on a longer time horizon. (View Highlight)
the fewer times you want to re-run a computation, the more rigorously you should be evaluating your data quality (View Highlight)
Don’t pre-compute numbers before you need them (View Highlight)
This principle is all about avoiding “cubes”. (View Highlight)
Outside of your event/snapshots jobs and your end-of-period business close jobs (monthly, quarterly etc.), what you want to achieve is a maximally flexible system that can be layered on in regular increments to power new and interesting combinations of metrics and exploration (View Highlight)