rw-book-cover

Metadata

Highlights

often think of ops as the forgotten stepchild of data. In many ways, ops teams are just data teams with worse tools and more chaos. Yet, in the best organizations, I’ve seen ops and data teams work closely together to enable each other. (View Highlight)

This is starting to change. Emilie Schario is building Turbine, which offers a fresh take on procurement, inventory, and supply chain management. Savant Labs is creating a cloud-native automation platform for analysts. Zamp is automating sales tax compliance. Latern is automating forecasting and churn reporting to enable CS teams to understand their customers better. (View Highlight)

While some tools may be happy to put their hat in the dbt Cloud Semantic Layer, I think there’s an obvious issue: 1) it limits their customers to only those who are on dbt Cloud and 2) it puts their product roadmap in the hands of another company. (View Highlight)

Metrics layer

There are, however, some interesting alternative approaches to this space. For example, Honeydew is building a semantic layer middleware that can be consumed directly from the warehouse, obviating the need for integrations with another tool’s semantic layer. Any application that can read from the warehouse can read semantic information. Julian Hyde’s work on Apache Calcite aims to bring metrics directly into SQL, with the long-term hope of standardizing how we express metrics and storing them in the warehouse outright. Malloy is an alternate take, a new open-source language for expressive data modeling. (View Highlight)

There are, however, some interesting alternative approaches to this space. For example, Honeydew is building a semantic layer middleware that can be consumed directly from the warehouse, obviating the need for integrations with another tool’s semantic layer. Any application that can read from the warehouse can read semantic information. (View Highlight)

When someone asks you to explain how we calculate churn, you could direct them to documentation that explains how churn is calculated, but the true definition of churn isn’t in the documentation; it’s in the DAG of transforms, filters, conditions, aggregations and more that occur step-by-step to take a sequence of events, invoices, subscriptions, payments, and constants into a monthly view of reported churn. This complexity is why column-level lineage demand and appeal are ever-growing. Understanding how this number came to me means understanding the web of inputs that led to it. (View Highlight)

Data mesh is, in some ways, the admission of defeat in the face of complexity. The demands of teams are so complex that we must break apart the whole thing into smaller, more manageable chunks. Sales get their metric, and Marketing gets theirs. When someone asks why the numbers don’t match, we tell them that they don’t match because they are different. (View Highlight)

The other option is forcing stakeholders to make tough decisions about simplifying their requests; that’s a dream we can all have. We can stare into the abyss and demand that the pit of despair leave us be (View Highlight)

We may decide to quit data altogether and see what those software engineers are up to—they seem much happier. (View Highlight)

So, the future is literally as today

But we wake from our dream. The data remains a mess. The stakeholders remain impatient. The work never ends, and we press on—one reconciliation at a time. T (View Highlight)