rw-book-cover

Metadata

Highlights

✅ int_[entity]s_[verb]s.sql - the variety of transformations that can happen inside of the intermediate layer makes it harder to dictate strictly how to name them. The best guiding principle is to think about verbs (e.g. pivoted, aggregated_to_user, joined, fanned_out_by_quantity, funnel_created, etc.) in the intermediate layer. In our example project, we use an intermediate model to pivot payments out to the order grain, so we name our model int_payments_pivoted_to_orders. It’s easy for anybody to quickly understand what’s happening in that model, even if they don’t know SQL. That clarity is worth the long file name. (View Highlight)

Exposed to end users. Intermediate models should generally not be exposed in the main production schema. They are not intended for output to final targets like dashboards or applications, so it’s best to keep them separated from models that are so you can more easily control data governance and discoverability. (View Highlight)

There are three interfaces to the organizational knowledge graph we’re encoding into dbt: the DAG, the files and folder structure of our codebase, and the output into the warehouse. As such, it’s really important that we consider that output intentionally! Think of the schemas, tables, and views we’re creating in the warehouse as part of the UX, in addition to the dashboards, ML, apps, and other use cases you may be targeting for the data. Ensuring that our output is named and grouped well, and that models not intended for broad use are either not materialized or built into special areas with specific permissions is crucial to achieving this. (View Highlight)

Narrow the DAG, widen the tables. Until we get to the marts layer and start building our various outputs, we ideally want our DAG to look like an arrowhead pointed right. (View Highlight)

One rule of thumb to ensure you’re following this pattern on an individual model level is allowing multiple inputs to a model, but not multiple outputs. (View Highlight)