Capacity Estimation

It’s Not a Math Test. It’s the Conversation That Builds Scalable Systems.

~1.2M

Peak Read QPS

Drives the need for aggressive caching and horizontal scaling.

~770 PB

Total Storage

Forces the use of object storage and database sharding.

500:1

Read/Write Ratio

Confirms that optimizing the read-path is the top priority.

The Vocabulary of Scale: Core Metrics

Traffic Metrics

DAU/MAU: User base size & engagement.
Stickiness: How often users return.

Load Metrics

QPS/RPS: Requests per second.
Peak vs. Average: Design for the highest load.

Performance Metrics

Latency (P99): User-perceived speed.
Response Time: Total wait time for a user.

Data Metrics

Storage: Total data footprint (TB, PB).
Bandwidth: Data in (Ingress) & out (Egress).

The Engineer’s Toolkit: Latency Matters

Understanding the relative cost of operations is key. A network call is orders of magnitude slower than reading from memory, which is why caching is so powerful.

The 5-Step Estimation Framework

Clarify Karo

Ask about scope, scale, and performance goals.

→

↓

Problem ko Todo

Break it into smaller parts (QPS, Storage, etc.).

→

↓

Assumptions Batao

State and justify every assumption you make.

→

↓

Calculate Karo

Do the back-of-the-envelope math.

→

↓

Sanity Check

Does the number make sense in the real world?

From Numbers to Architecture: The “So What?” Test

THE ESTIMATE

~1.2 Million Peak Read QPS

→

THE ARCHITECTURE

Multi-Layer Caching (CDN + Redis) & Horizontal Scaling behind a Load Balancer.

THE ESTIMATE

~770 PB Total Storage

→

THE ARCHITECTURE

Polyglot Persistence: Object Storage (S3) for files, Sharded NoSQL DB for metadata.

THE ESTIMATE

P99 Latency < 200ms

→

THE ARCHITECTURE

Asynchronous “Fan-out on Write” pattern using a Message Queue to pre-compute feeds.

The Read vs. Write Story

A 500:1 Read-to-Write Ratio

This single insight is critical. It tells us that the system is overwhelmingly read-heavy. Therefore, our primary engineering effort and budget should be focused on optimizing the read path. Aggressive caching isn’t just a nice-to-have; it’s the only way to build a performant and cost-effective system at this scale.

Solving for Latency: The Fan-Out Pattern

When a strict latency SLO (e.g., <200ms) meets high read QPS, generating feeds on-the-fly is too slow. The architecture must shift from a "pull" model to an asynchronous "push" model.

SLOW: Pull-on-Read

1. User requests feed.

2. Server queries DB for all followed users.

3. Server queries DB for recent posts of ALL followed users.

4. Server sorts and merges results.

5. Return feed. ❌ Violates Latency SLO.

FAST: Push-on-Write (Fan-out)

1. User uploads a photo.

2. Upload service publishes event to a Message Queue.

3. Worker services consume event.

4. Workers pre-compute and update the cached feed for each follower.

5. When a user requests feed, it’s a simple, fast lookup from the Cache. ✅ Meets Latency SLO.

System Design 101: java

Curriculum

Capacity Estimation

The Vocabulary of Scale: Core Metrics

Traffic Metrics

Load Metrics

Performance Metrics

Data Metrics

The Engineer’s Toolkit: Latency Matters

The 5-Step Estimation Framework

Clarify Karo

Problem ko Todo

Assumptions Batao

Calculate Karo

Sanity Check

From Numbers to Architecture: The “So What?” Test

The Read vs. Write Story

A 500:1 Read-to-Write Ratio

Solving for Latency: The Fan-Out Pattern

SLOW: Pull-on-Read

FAST: Push-on-Write (Fan-out)

Leave a Reply Cancel reply

Modal title