Online Offline Indicator

Below, I’ll provide a step-by-step evolution of the online/offline indicator system based on the transcript. For each major evolution point, I’ll include:

A brief description of the step.
Key decisions or changes.
A textual diagram (using simple ASCII art for visualization, as it’s easy to copy into notes). These are “snapshots” of the system at that point, focusing on components like client, API server, DB, and data flow.

This way, you can use them for quick revision. If you’d like actual image diagrams (e.g., generated visuals), confirm, and I can proceed to create them.

Step 1: Initial Requirement and Storage Idea

Description: Start with the basic requirement: Show if a user is online (green dot) or offline. First thought: Store exactly what’s needed – user ID as key, online/offline as value (boolean).
Key Points: Focus on database as the brittle component. Access pattern is key-value. No schema decided yet.
Diagram Snapshot (Simple key-value storage):

[Client] --(Query Status)--> [API Server] --(Read)--> [Database]
                                         |
                                         v
User Table:
+---------+--------+
| User ID | Status |
+---------+--------+
|   123   | Online |
|   456   | Offline|
+---------+--------+

Step 2: Define Access Pattern and Batch API

Description: Decide how to read data. Expose a batch GET API (/status/users) to fetch status for multiple users efficiently, avoiding N calls for N users.
Key Points: Batch everything possible (APIs, queries) for efficiency. Prototype later shows batch vs. non-batch performance.
Diagram Snapshot (Adding API layer):

[Client] --(GET /status/users?ids=123,456)--> [API Server] --(Batch Query: SELECT status WHERE user_id IN (...))--> [Database]
                                                              |
                                                              v
User Table (unchanged):
+---------+--------+
| User ID | Status |
+---------+--------+
|   123   | Online |
|   456   | Offline|
+---------+--------+
Response: {123: "Online", 456: "Offline"}

Step 3: Updating Status – Push vs. Pull

Description: How to update status? Evaluate push (client sends heartbeats) vs. pull (server queries clients). Pull not feasible (server can’t initiate to clients, no IP known). Go with push: Clients send periodic “I’m alive” calls.
Key Points: Always evaluate opposites. Push chosen; expose POST /heartbeat.
Diagram Snapshot (Adding update flow):

Push (Chosen):
[Client] --(POST /heartbeat every 5s)--> [API Server] --(UPDATE status = Online WHERE user_id = ?)--> [Database]

Pull (Rejected):
[API Server] --(Poll: Are you alive?)--> [Client]  (Not possible)

User Table:
+---------+--------+
| User ID | Status |
+---------+--------+
|   123   | Online |  <-- Updated on heartbeat
+---------+--------+

Step 4: Evolve Schema for Reliability (Handle Crashes)

Description: Boolean won’t work if device crashes (no “offline” call). Evolve to store last_heartbeat timestamp. Mark offline if no heartbeat for configurable time (e.g., 30s).
Key Points: Implementation details evolve schema. Update on heartbeat: SET last_heartbeat = NOW().
Diagram Snapshot (Schema change):

[Client] --(POST /heartbeat)--> [API Server] --(UPDATE last_heartbeat = CURRENT_TIME WHERE user_id = ?)--> [Database]

User Table (Evolved):
+---------+----------------+
| User ID | Last Heartbeat |
+---------+----------------+
|   123   | 1724400000     |  <-- Epoch timestamp
+---------+----------------+

Step 5: Update GET Logic

Description: Change GET implementation: If no entry or last_heartbeat > now – 30s = Online; else Offline. Done in batch.
Key Points: Business logic adapts to new schema.
Diagram Snapshot (Full read/update flow):

Read Flow:
[Client] --(GET /status/users?ids=...)--> [API Server] --(SELECT last_heartbeat WHERE user_id IN (...))--> [Database]
                                          |
                                          v
Logic: IF last_heartbeat > (now - 30s) THEN Online ELSE Offline

Update Flow (unchanged from Step 4)

Step 6: Capacity Estimation

Description: Calculate storage: User ID (4 bytes) + Timestamp (4 bytes) = 8 bytes/entry. 1B users = 8GB (not a problem). Focus on queries/load (e.g., 1M active users, heartbeats every 10s = 6M updates/min).
Key Points: Storage rarely issue; compute/queries are. Data-back decisions.
Diagram Snapshot (Scale view):

Scale Calc:
Users: 1B entries * 8 bytes = 8GB (fits in phone)

Load:
[1M Clients] --(6 heartbeats/min each)--> [API Server] --(6M Updates/min)--> [Database]

User Table:
+---------+----------------+
| User ID | Last Heartbeat |  <-- One entry per user
+---------+----------------+

Step 7: Prototype and Benchmark

Description: Build simple Go code: POST /heartbeat updates timestamp; GET /status checks it. Benchmark batch vs. non-batch (e.g., 662μs batch vs. 5ms non-batch for 20 users).
Key Points: Prototype for nuances/benchmarking. Focus on goal (e.g., no auth).
Diagram Snapshot (Prototype components):

Prototype:
[Client] -- POST /heartbeat {user_id:123} --> [Go Server] -- REPLACE INTO heartbeats (user_id, last_heartbeat) VALUES (?, time.Now()) --> [MySQL DB]

[Client] -- GET /status/users?ids=1,2,... --> [Go Server] -- SELECT ... IN (...) + Logic --> Response {1: true, ...}

Step 8: Alternative Storage – Only Online Users with Expiration

Description: Store only for online users; expire entries after 30s if no heartbeat. Evaluate: Cron job to delete old vs. offload to DB (TTL).
Key Points: Reduce data; offload if possible (no reinvention).
Diagram Snapshot (Expiration flow):

[Client] --(Heartbeat)--> [API Server] --(SET key=user_id, value=timestamp, EXPIRE 30s)--> [DB with TTL e.g., Redis]

If no heartbeat >30s: Auto-delete entry --> Offline

User Table (Only online):
+---------+----------------+
| User ID | Last Heartbeat |  <-- Expires if old
+---------+----------------+

Step 9: Database Choice – Redis vs. DynamoDB

Description: For key-value + expiration. Evaluate: Open source vs. managed, vendor lock-in, pricing (DynamoDB per 1KB write), performance (prototype), scalability (DynamoDB partitions concurrent writes).
Key Points: Non-technical factors key (e.g., multi-cloud, cost at scale).
Diagram Snapshot (DB options):

Option 1: Redis (Self-hosted, in-memory, open source)
[API] --(SETEX user_id 30 timestamp)--> [Redis Cluster]

Option 2: DynamoDB (Managed, AWS, partitioned)
[API] --(PutItem with TTL)--> [DynamoDB]

Pros/Cons:
Redis: Flexible deploy, performance guess better (benchmark!)
Dynamo: Easy scale, but lock-in, per-write cost.

Step 10: Scaling Considerations (High Updates, Connections)

Description: Handle 6M updates/min: Micro-updates problematic. Mention connection pooling (e.g., TCP 3-way handshake overhead per request).
Key Points: Optimize connections; stateless APIs easy to scale.
Diagram Snapshot (Scaled system):

[Load Balancer] --> [API Servers (Stateless, Scaled)] --(Pooled Connections)--> [DB (Sharded/Partitioned)]

Load: 1M users * 6/min = High concurrent updates
Optimization: Batch? Pool connections to avoid handshake per heartbeat.

Looking at this transcript, your mentor shared incredibly valuable insights that go far beyond just building an online/offline indicator. Let me help you extract the hidden knowledge and system design principles that will make you excel in interviews.

The Hidden System Design Framework

1. Start with Data, Not Architecture

Your mentor emphasized: “Database is the most brittle component of your architecture”

Hidden insight: In interviews, always start by:

Defining what you’re storing and why
Understanding access patterns first
Then choosing the database based on requirements, not preferences

Interview application: Don’t jump to “I’ll use MongoDB” – explain your data model, access patterns, then justify your choice.

2. Always Evaluate Opposites

“In computer science, you would always see opposites in action. Push versus pull.”

Hidden insight: For every design decision, consider the alternative:

Push vs Pull
Sync vs Async
SQL vs NoSQL
Batch vs Individual operations
Pull vs Push notifications

Interview tip: When you propose a solution, immediately mention the alternative and explain why you chose one over the other.

3. The Power of Batching

The mentor showed 5ms vs 600 microseconds – that’s an 8x performance improvement!

Hidden insight: Batch operations wherever possible:

API calls (get status of multiple users at once)
Database queries (IN clause vs multiple SELECT statements)
Network requests

Interview application: Always mention batching as an optimization technique.

4. Schema Evolution During Implementation

Notice how the schema evolved from user_id + boolean to user_id + timestamp?

Hidden insight: Initial designs always evolve when you think through implementation details.

Interview strategy:

Start with simple schema
Walk through the implementation
Naturally evolve your design
Show this evolution as strength, not weakness

5. Capacity Estimation Reality Check

“8GB for 1 billion users – you have more storage in your phone”

Hidden insight: Storage is rarely the bottleneck – it’s usually:

Concurrent requests
Network bandwidth
CPU for processing
Database connections

Interview approach: Do the math, but focus on the real bottlenecks.

The Non-Technical Factors (This is Gold!)

Your mentor spent significant time on these – they’re often more important than technical decisions:

1. Vendor Lock-in Considerations

Redis (open source, multi-cloud) vs DynamoDB (AWS only)
Can you switch providers if needed?
Will your cloud provider become your competitor?

2. Pricing Models Matter

DynamoDB charges per operation (even 4-byte updates cost 1KB pricing)
Redis charges for infrastructure only
Consider your usage patterns

3. Operational Complexity

Can you manage it in production?
Does it have good tooling?
How mature is the technology?

4. Business Context

Early startup vs enterprise
Technical debt tolerance
Team expertise

Advanced Interview Strategies

1. The Prototype Mindset

Your mentor built a working prototype in minutes. In interviews:

Sketch out key components
Walk through the critical path
Show you can think about implementation details

2. Challenge Every Decision

“We critically challenge every single decision that we make”

In interviews:

Question your own assumptions
Propose alternatives
Show trade-off analysis

3. Think About Edge Cases Early

What if the device crashes before sending “offline” signal?
What if the network is unreliable?
How do you handle expired data?

4. Connection Pooling and Resource Management

The mentor mentioned connection pooling – this shows deep technical understanding:

Database connections are expensive to create
Pool connections for better performance
Consider resource limits

Interview Framework Based on This

Clarify Requirements (5 minutes)
Data Modeling (10 minutes)
- What are we storing?
- Access patterns?
- Schema design
API Design (5 minutes)
- Batch operations where possible
- RESTful design
Database Choice (10 minutes)
- Technical factors
- Non-technical factors (cost, vendor lock-in, etc.)
Scale Considerations (10 minutes)
- Capacity estimation
- Performance bottlenecks
- Optimization techniques
Implementation Details (10 minutes)
- Connection pooling
- Error handling
- Monitoring

Key Phrases That Show Seniority

“Let’s evaluate both approaches”
“The trade-off here is…”
“At our scale, X matters more than Y”
“From an operational perspective…”
“The business context here is…”
“Let me quantify this with some numbers”

Practice Exercise

Try redesigning the online/offline system considering:

What if you have 100M concurrent users?
What if users are on mobile with poor connectivity?
What if you need 99.99% availability?
What if different features need different “online” definitions?

The key insight from your mentor: Senior engineers spend 80% time on design decisions and trade-offs, 20% on implementation. Focus on developing this decision-making framework, and you’ll excel in system design interviews.

System Design Masterclass

Curriculum