Mastering Capacity Estimation
Master Capacity Estimation
Your ultimate guide to acing system design interviews and building scalable systems.
Why Capacity Estimation Matters
Capacity estimation isn’t just numbers—it’s the backbone of designing robust, scalable systems.
Guide Design Decisions
Estimates shape your choice of databases, caching, and scaling strategies.
Spot Bottlenecks Early
Identify weak points in your system before they cause failures.
Showcase Expertise
Demonstrate senior-level thinking in interviews.
Build Resilient Systems
Plan for peak loads to handle real-world traffic spikes.
Back-of-the-Envelope Basics
Quick calculations to guide system design without overcomplicating.
What is BoE?
- Rough calculations for system parameters.
- Prevents resource misallocation.
- Guides component choices like load balancers.
Key Tips
- Use approximate, round numbers.
- Keep calculations quick (~5-10 min).
- Validate with simple assumptions.
Estimation Cheat Sheet
Units & Multiples
- Thousand (Kilo – KB): 10³
- Million (Mega – MB): 10⁶
- Billion (Giga – GB): 10⁹
- Trillion (Tera – TB): 10¹²
- Quadrillion (Peta – PB): 10¹⁵
Data Size Assumptions
- Character (Unicode): 2 bytes
- Long/Double: 8 bytes
- Average Image: 300 KB
- 1-min SD Video: 10-20 MB
- Seconds in a Day: ~100,000
System Design Interview Framework
Capacity estimation turns vague requirements into concrete numbers, shaping your system’s architecture.
Key Metrics & Their Connections
Metrics like QPS, storage, and CPU are interconnected. A change in one impacts the others.
Focus on the bottleneck—often database QPS.
- DAU/MAU: Drives traffic estimates.
- QPS: Measures server load.
- Storage: Total data over time.
- Bandwidth: Network capacity.
- Memory/CPU: Processing resources.
Step-by-Step Estimation
A structured approach to estimate capacity for any system.
1. Traffic (QPS)
Inputs:
- User base size
- DAU/MAU percentage
- Queries per user/day
- Seconds/day (~10⁵)
QPS = (Active Users × Queries/User) ÷ Seconds/Day
Example: 50M DAU, 20 queries/user → 1B queries/day ÷ 10⁵ = 10,000 QPS.
2. Storage
Inputs:
- Data types (e.g., text, images)
- Size per item
- Items/user/day
- Retention period
Daily Storage = Active Users × Items/User × Size/Item
Total = Daily Storage × Days × Growth Factor
3. RAM (Cache)
Inputs:
- Data to cache
- Size per cached item
- Percentage of users/items
RAM = Cached Items × Size/Item
Machines = RAM ÷ RAM/Machine
4. Servers
Inputs:
- Total QPS
- QPS/server (CPU, latency)
Servers = Total QPS ÷ QPS/Server
Example: 10,000 QPS, 100 QPS/server → 100 servers.
CAP Theorem
Choose two: Consistency, Availability, Partition Tolerance. Your estimates guide these trade-offs.
Latency Numbers to Know
Understand operation speeds to optimize system performance.
Operation | Time (ns) | Relative Cost |
---|---|---|
L1 cache reference | 0.5 | 1x |
Branch mispredict | 5 | 10x |
L2 cache reference | 7 | 14x |
Mutex lock/unlock | 100 | 200x |
Main memory reference | 100 | 200x |
Compress 1K bytes | 10,000 | 20,000x |
Send 2K over 1 Gbps | 20,000 | 40,000x |
Read 1 MB from memory | 250,000 | 500,000x |
Round trip (datacenter) | 500,000 | 1,000,000x |
Disk seek | 10,000,000 | 20,000,000x |
Key Insights:
- Memory is fast; disks are slow.
- Avoid disk seeks for performance.
- Compress data to save network time.
Real-World Examples
Compare how estimates shape system design.
Instagram: Read-Heavy
High read volume from feeds requires robust caching.
Implication:
Use CDNs and Redis for efficient read handling.
Common Pitfalls
❌ Don’t Skip Clarifications
Designing without clear requirements leads to errors.
✅ Ask Questions
Clarify users, features, and scale upfront.
❌ Don’t Overcomplicate Math
Complex calculations waste time and risk errors.
✅ Use Simple Numbers
Round numbers for quick, reasonable estimates.