OpenAI runs ChatGPT for 800 million users on a single-primary PostgreSQL instance with nearly 50 read replicas - no sharding, no distributed database. Here's the engineering behind keeping it at five-nines availability and low double-digit millisecond p99 latency.
Database - a system that stores and retrieves data for an application. Every time you send a ChatGPT message, a database is involved.
PostgreSQL - a popular open-source database. Think of it as a very fast, very reliable filing cabinet for structured data.
Read replica - a read-only copy of the database. It can answer queries but can't accept new writes, allowing you to spread read traffic across many machines.
Sharding - splitting a database into smaller pieces across multiple machines so no single one holds all the data. Powerful but complex to manage.
Connection pooling - instead of opening a fresh connection to the database for every request (slow), a pool of pre-opened connections is reused (fast).
Cache - a fast in-memory store that saves recent results so the database doesn't have to answer the same question repeatedly.
My Key Takeaways
A standard database setup can handle far more traffic than most people expect - the key is spreading the load intelligently rather than rebuilding everything
Splitting a database across many machines is extremely complex - it should only be done when every simpler option has been exhausted
When cached data expires under heavy traffic, thousands of requests can hit the database at once and take it down - preventing that pile-up is critical
Reusing database connections instead of opening new ones for every request cuts latency dramatically and prevents the database from being overwhelmed
Limiting traffic shouldn't only happen at the front door - every layer of the system needs to be able to protect itself when things get out of hand