Netflix built a Real-Time Distributed Graph to stitch together member activity across devices, apps, and business lines - processing over 5 million records per second using Kafka and Apache Flink.
Graph database - A way of storing data that emphasizes the relationships between things, rather than just the things themselves. Instead of rows in a table, you store "nodes" (entities like a user or a show) and "edges" (connections like "watched" or "logged in"). This makes it fast to ask questions like "what did this user do across all their devices?"
Stream processing - Analyzing data in real time as it arrives, rather than waiting to collect it in batches. Think of it like a live sports ticker versus a newspaper box score printed the next morning.
Apache Kafka - A high-throughput messaging system that acts as a central hub for events. When a Netflix member presses play, that action becomes a message published to Kafka, where other systems can pick it up instantly.
Apache Flink - A framework for writing programs that consume Kafka streams and process events in real time. Netflix uses Flink jobs to transform raw app events into graph nodes and edges.
Microservices - An architecture where a large application is split into many small, independent services. Netflix has hundreds of them, each owning its own data - which is powerful for engineering teams but creates silos when you need a unified view of a member.
Netflix's product now spans streaming, ads, live events, and mobile games - and their original data infrastructure wasn't built to connect user behavior across these silos. The Real-Time Distributed Graph was the solution.
A graph model beats traditional tables when the relationships between data matter as much as the data itself. Netflix can now ask "what did this member do across all devices in the last few minutes?" without expensive joins.
Processing 1 million Kafka messages per second per topic requires careful operational discipline. Netflix learned that one monolithic Flink job is a scaling trap - per-topic jobs are simpler to tune and much easier to keep healthy.
The pipeline writes over 5 million graph records per second downstream, which is only feasible because of deduplication and buffering built into the Flink processing layer.
Building at this scale is a platform problem, not just an application problem. Netflix explicitly credits their internal data platform teams - without shared Kafka, Flink, and Iceberg infrastructure, the RDG would not have been buildable.