🗞️ The missing piece of the modern data stack

Tags:: 🗞️Articles , Metrics layer
Author:: Benn Stancil
Link:: The missing piece of the modern data stack (substack.com)
Source date:: 2021-04-22
Finished date:: 2021-04-22

Before a metric layer, you make do with rollups (Rollup tables):

Consider the journey that data follows to reach that dashboard. After being written into a warehouse (…) data is updated by a transformation tool (…) several times, passing through a couple types of aggregations along the way.

Cleaned granular data: The first stage creates dimension and fact tables, which maintain the same degree of granularity as raw tables while removing the irregularities and aesthetic irritations that make raw data difficult to work with.

A second stage rolls up granular data into aggregated metrics tables. For example, a rollup_active_users table might include rows for daily, weekly, rolling 7-day, and rolling 28-day active users, signups, returning users, and so on.

Rollup tables are typically generated by transformation tools like dbt, so the metrics in these tables can be consistently defined and reliably governed. However, because rollup tables are precomputed, there’s a practical limit to how many can be created. As a result, they’re often only built for top-level metrics, like active users or customer NPS.

But self-serve analysis requires another level of depth—daily active users for a particular customer segment, or NPS for a particular type of user. Even with just a handful of metrics and segments, it’s all but impossible to precompute every possible combination

Without a rollup to draw from, data consumers have to follow the second path: aggregate new metrics directly from dimension tables. That leaves the nature of the aggregation up to the person doing the analysis, and these aggregations are rarely simple. Counting weekly orders in Europe, for example, requires you to define week, order, and Europe. Do weeks start on Sunday or Monday? In which time zone? Do orders include those made with gift cards? What about returns? And are European customers those with billing addresses or shipping addresses in Europe? Are Russian customers European? Are British customers European? While all of this logic might live in the rollup_orders table, it isn’t necessarily in the dimension_orders table, meaning someone has to apply it on their own to do their analysis. This makes it incredibly difficult for people, especially people who aren’t analysts and aren’t familiar with the weird nuances that riddle most datasets, to consistently arrive at the same result.

Y este es el quid de la cuestión por lo que necesitas una metrics layer (en BI, o mejor, fuera): porque no puedes hacer rollups con todo.

BI tools appear to solve this problem by offering ways for people to define on-the-fly computations in reports and dashboards. Through LookML in Looker, calculated fields in Tableau and Mode, and formulas in Sisense, analysts can configure visualizations to aggregate results in specific ways.

No code no tiene por qué ser menos complejo…

If the query language is simple enough, text-based commands can actually be more accessible than the complex “code-free” UIs of tools like Tableau.

Dr. Mario's 2nd 🧠

Explorer

🗞️ The missing piece of the modern data stack

Graph View

Backlinks