Capacity Estimation
Capacity Estimation
It’s Not a Math Test. It’s the Conversation That Builds Scalable Systems.
~1.2M
Peak Read QPS
Drives the need for aggressive caching and horizontal scaling.
~770 PB
Total Storage
Forces the use of object storage and database sharding.
500:1
Read/Write Ratio
Confirms that optimizing the read-path is the top priority.
The Vocabulary of Scale: Core Metrics
Traffic Metrics
- DAU/MAU: User base size & engagement.
- Stickiness: How often users return.
Load Metrics
- QPS/RPS: Requests per second.
- Peak vs. Average: Design for the highest load.
Performance Metrics
- Latency (P99): User-perceived speed.
- Response Time: Total wait time for a user.
Data Metrics
- Storage: Total data footprint (TB, PB).
- Bandwidth: Data in (Ingress) & out (Egress).
The Engineer’s Toolkit: Latency Matters
Understanding the relative cost of operations is key. A network call is orders of magnitude slower than reading from memory, which is why caching is so powerful.
The 5-Step Estimation Framework
Clarify Karo
Ask about scope, scale, and performance goals.
Problem ko Todo
Break it into smaller parts (QPS, Storage, etc.).
Assumptions Batao
State and justify every assumption you make.
Calculate Karo
Do the back-of-the-envelope math.
Sanity Check
Does the number make sense in the real world?
From Numbers to Architecture: The “So What?” Test
THE ESTIMATE
~1.2 Million Peak Read QPS
THE ARCHITECTURE
Multi-Layer Caching (CDN + Redis) & Horizontal Scaling behind a Load Balancer.
THE ESTIMATE
~770 PB Total Storage
THE ARCHITECTURE
Polyglot Persistence: Object Storage (S3) for files, Sharded NoSQL DB for metadata.
THE ESTIMATE
P99 Latency < 200ms
THE ARCHITECTURE
Asynchronous “Fan-out on Write” pattern using a Message Queue to pre-compute feeds.
The Read vs. Write Story
A 500:1 Read-to-Write Ratio
This single insight is critical. It tells us that the system is overwhelmingly read-heavy. Therefore, our primary engineering effort and budget should be focused on optimizing the read path. Aggressive caching isn’t just a nice-to-have; it’s the only way to build a performant and cost-effective system at this scale.
Solving for Latency: The Fan-Out Pattern
When a strict latency SLO (e.g., <200ms) meets high read QPS, generating feeds on-the-fly is too slow. The architecture must shift from a "pull" model to an asynchronous "push" model.
SLOW: Pull-on-Read
1. User requests feed.
2. Server queries DB for all followed users.
3. Server queries DB for recent posts of ALL followed users.
4. Server sorts and merges results.
5. Return feed. ❌ Violates Latency SLO.
FAST: Push-on-Write (Fan-out)
1. User uploads a photo.
2. Upload service publishes event to a Message Queue.
3. Worker services consume event.
4. Workers pre-compute and update the cached feed for each follower.
5. When a user requests feed, it’s a simple, fast lookup from the Cache. ✅ Meets Latency SLO.