- Tags:: 🗞️Articles, Data Architectures
- Author:: Matt Bornstein, Martin Casado, Jennifer Li
- Link:: Emerging Architectures for Modern Data Infrastructure: 2020 | Future
- Source date:: 2020-10-15
- Finished date:: 2021-04-24
A highly referenced paper by a16z. With the iconic diagram:
Lakehouse. Convergence of Data Lake and Data Warehouse. Data lakes usually for ML, and Data Warehouses usually for BI are converging:
Each of these technologies has religious adherents, and building around one or the other turns out to have a significant impact on the rest of the stack (more on this later). But what’s really interesting is that modern data warehouses and data lakes are starting to resemble one another – both offering commodity storage, native horizontal scaling, semi-structured data types, ACID transactions, interactive SQL queries, and so on.
The key question going forward: are data warehouses and data lakes are on a path toward convergence? That is, are they becoming interchangeable in the stack? Some experts believe this is taking place and driving simplification of the technology and vendor landscape. Others believe parallel ecosystems will persist due to differences in languages, use cases, or other factors.