rw-book-cover

Metadata

Highlights

the architecture of dbt is dated. Like Airflow, this is a major disadvantage— they have some major tech-debt and it’ll be tough to implement foundational changes (View Highlight)

It’s a mystery why Google hasn’t invested more in the Dataform + BigQuery experience, given the insane opportunity they have. As I’ll discuss later, many are jumping ship from Redshift. The potential for Google to roll out a “data environment in a box” is super enticing, but who know’s what’s going on in product over there? 🧐 (View Highlight)

little has been done to simplify the process of building a functional analytics solution. Take, for example, creating a test/production environment. This is a relatively trivial problem that’s actually become easier in products like Snowflake with zero-copy clones, but for tools like Redshift or BigQuery, it’s necessary to architect some unnecessarily complex process— either restoring a snapshot nightly or programmatically generating SQL. The worst part: most of the clever solutions to this problem are buried in forums/Slack/etc. Once again, knowing someone who’s solved this problem for your specific warehouse can be a huge time saver. (View Highlight)

📖 the innovator dilemma workflow orchestrators

Airflow is obsolete. Speaking of obsolete products, I would highly advise against an Airflow implementation. Why? There are a number of tools (Dagster, Prefect, Mage, to name a few) that are being built from the ground-up to address Airflow’s failures. These solutions are more nimble than Airflow and can iterate fast. One of the biggest downfalls of Airflow has been it’s success— now, the open source community has to focus on maintaining the product to be sure it doesn’t break existing deployments rather than innovating. The 717 open issues on github (as of this writing) are a testament to this. (View Highlight)

If you’re looking for an orchestrator (and executor, which Airflow is not) that features a testing framework, better observability, support for dataframes as assets/objects, and tighter integration with data transformation tools, like dbt, I’d highly suggest one of the above products/libraries. (View Highlight)

Same experience with Airbyte at Mercadona Tech

I feel Airbyte’s marketing to be disingenuous, as it might seem a Fivetran-killer when, in fact, it breaks in most use cases (I can confirm this from personal experience). (View Highlight)