Metadata
- Author: Maxime Beauchemin
- Full Title:: Introducing Entity-Centric Data Modeling for Analytics
- Category:: 🗞️Articles
- Document Tags:: data modeling
- URL:: https://preset.io/blog/introducing-entity-centric-data-modeling-for-analytics/
- Finished date:: 2023-04-23
Highlights
The core idea on when should we use this:
Dimensional modeling does a great job at capturing, summarizing, and simplifying the underlying logical model with its fact and dimension tables approach, and makes it really natural to do a multi-dimensional analysis of facts. Where it comes short is while doing multi-factual analysis of entities (View Highlight)
an entity-focussed wide dataset (ie: user, customer, ad campaign, …), with a large though clearly labeled set of columns, is highly intuitive and easy to use for everyone. (View Highlight)
To be clear, ECM aligns closely with dimensional modeling as it is entity-centric in many ways and supports the existence of fact tables; it just considers that metrics can also live in the dimension tables. (View Highlight)
where Ralph Kimball would frown upon bringing many metrics in dimension tables (to him, metrics strictly belong in fact tables) and would not have considered using more complex data structures as they were not common practice in his era, the entity-centric approach actually prescribes doing this and offers methodologies as to how to do this well - more on that later on. (View Highlight)
thinking of “user visits” as a set of features in the user table, like “7d visits”, “28d visits”, and “total visits since account creation”, allows us to think in terms of cohorts and distributions, such as “users who visited more than 14 out of the past 28 days are significantly more likely to use feature X”. The idea behind ECM is to bring that mindset and the power that comes with it to the analytics and data warehousing side of the fence. In other words, the goal is to take the entity-centric thinking and flat, wide data modeling techniques that are commonly used in feature engineering and apply them to analytics and data warehousing (View Highlight)