• Tags:: 📝CuratedNotes , Streaming You have events happening somewhere (e.g., GPS coordinates of vehicles), and you cannot wait much to analyze it (because you have time-sensitive actions to make). There are three main options to enable that:
  • The simplest one is just ingesting those events in any data warehouse, and build regular views over the events table(s). Depending on the velocity of the scenario, and latency requirements, this may be enough.
  • Today, the most comfortable thing to do is to use a streaming storage system (Spark’s Structured Streaming, Materialize, Rockset…) that would allow you incrementalization, having an incrementally updated materialized view. Note that these are different from “traditional” materialized views (even in Snowflake or Big Query), which maintain the materialization simply by re-running the view’s query on a schedule.
  • Alternatively, in the past, you had to either:
    • Build incrementalization by yourself, through (micro)batching.
    • Use dataflows during the ingestion, to perform computations over streams (e.g., Google Dataflow.

Refs

Navigating the Streaming Landscape - Speaker Deck