System Design Interview Framework
Interviewers care more about how you think than the final architecture. Structure is everything.
The RADIO Frameworkโ
| Step | Time | Goal |
|---|---|---|
| Requirements | 5 min | Define scope & constraints |
| API Design | 5 min | Define the interface |
| Data Model | 5 min | Define storage schema |
| Initial Design | 5โ10 min | High-level diagram |
| Optimizations | 10โ15 min | Deep dives on bottlenecks |
Step 1 โ Requirements Clarification (5 min)โ
Never start designing without asking these questions.
Functional Requirementsโ
- What are the core features? (List top 3)
- What's out of scope?
- Who are the users?
Non-Functional Requirementsโ
- Scale: DAU, QPS, data volume
- Latency target (p99 < 200ms?)
- Availability (99.9%, 99.99%?)
- Consistency level (strong, eventual?)
- Geo-distribution needed?
- Read/write ratio
Good Questions to Askโ
- "How many daily active users should I design for?"
- "Is this read-heavy or write-heavy?"
- "Do we need to support mobile clients?"
- "What's the acceptable latency for the core operation?"
- "Do we need global distribution?"
- "How long should data be retained?"
Step 2 โ API Design (5 min)โ
Define the public interface before the internals.
REST Exampleโ
POST /v1/posts - Create a post
GET /v1/posts/{id} - Get a post
GET /v1/users/{id}/feed - Get user feed
PUT /v1/posts/{id} - Update a post
DELETE /v1/posts/{id} - Delete a post
Key Decisionsโ
- Request/response shape
- Authentication mechanism (JWT, OAuth)
- Pagination strategy (cursor vs offset)
- Rate limiting boundaries
Step 3 โ Data Model (5 min)โ
Define entities and relationships before choosing storage.
User { id, name, email, created_at }
Post { id, user_id, content, created_at }
Follow { follower_id, followee_id, created_at }
Storage Selection Guideโ
| Use Case | Storage Choice |
|---|---|
| Structured relational data | PostgreSQL / MySQL |
| Unstructured / flexible schema | MongoDB |
| Key-value / cache | Redis |
| Time-series data | InfluxDB / TimescaleDB |
| Full-text search | Elasticsearch |
| Large files / blobs | S3 / GCS |
| Graph relationships | Neo4j |
| Column-family (wide table) | Cassandra / HBase |
Step 4 โ High-Level Design (5โ10 min)โ
Draw boxes and arrows. Keep it simple at first.
Client โ Load Balancer โ API Gateway โ Service โ DB
โ
Cache (Redis)
โ
Message Queue (Kafka)
โ
Worker Service
Always Includeโ
- Load balancer (never one server)
- Database (specify type)
- Cache layer
- CDN (if media/static assets involved)
- Async processing (if writes are heavy)
Step 5 โ Deep Dive / Optimizations (10โ15 min)โ
The interviewer will guide this. Common deep dives:
| Problem | Solution to Discuss |
|---|---|
| High read QPS | Caching, read replicas, CDN |
| High write QPS | Sharding, write-ahead log, async queue |
| Hotspot (celebrity user) | Special handling, fan-out-on-read |
| Large payloads | Chunking, object storage, presigned URLs |
| Real-time requirements | WebSocket, SSE, long polling |
| Exactly-once semantics | Idempotency keys, deduplication |
| Long-running jobs | Job queue, progress API, async callbacks |
Communication Tipsโ
Doโ
- Think out loud at all times
- State assumptions explicitly
- Estimate before architecting
- Mention trade-offs for every decision
- Ask "does this align with what you're looking for?"
Don'tโ
- Jump to solutions before requirements
- Over-engineer the first design
- Stay silent while thinking
- Ignore non-functional requirements
- Forget to mention failure scenarios
Common Pitfallsโ
| Pitfall | Fix |
|---|---|
| Designing a single-server system | Always start with load balancer + multiple instances |
| Ignoring failure modes | Ask "what happens if this component fails?" |
| No capacity estimation | Do rough math before picking technology |
| Picking tech without justification | "I choose Kafka here because we need durable, ordered message delivery at scale" |
| Skipping the data model | Schema design surfaces hidden complexity early |
| Not discussing consistency trade-offs | State your consistency model explicitly |
Sample Opening Structureโ
"Let me start by clarifying requirements. Based on what you said, the core features are: [X, Y, Z]. I'll treat [A, B] as out of scope for now. For scale, I'll assume 10M DAU, with a 10:1 read/write ratio. Let me do a quick estimation before we dive into the design..."
Interview Questionsโ
Q: How would you approach designing a system you've never built before?โ
A: I follow a repeatable flow: clarify requirements, define scope, estimate traffic/storage, propose a simple high-level architecture, then drill into bottlenecks and trade-offs. I make assumptions explicit and validate them with the interviewer before going deeper.
Q: If an interviewer asks you to "design Twitter," what are the first 5 questions you ask?โ
A:
- What exact features are in scope (posting, timeline, follow graph, search, notifications)?
- What scale should we design for (DAU, QPS, peak factor, geo distribution)?
- What read/write ratio do we expect?
- What are latency and availability targets?
- Are there consistency requirements (e.g., timeline freshness, exactly-once delivery) and cost constraints?
Q: How do you decide whether to use SQL or NoSQL for a new system?โ
A: I choose based on data shape and access patterns. SQL is preferred for relational data, strong consistency, and complex joins/transactions. NoSQL is preferred for horizontal scale, flexible schema, and high-throughput key-value/document access. Many real systems use both, each for the part it fits best.
Q: How do you handle the trade-off between consistency and availability in your design?โ
A: I classify operations by business criticality. Critical paths (payments, account balance) favor stronger consistency. User-experience paths (feeds, counters, analytics) can accept eventual consistency for higher availability and lower latency. I define failure behavior up front: what can degrade, what must block, and why.
Q: Walk me through how you'd estimate QPS for a feature that 1% of 100M users will use daily.โ
A:
- Daily active users for feature:
100,000,000 \times 1\% = 1,000,000. - Average QPS across a day:
1,000,000 / 86,400 \approx 11.6QPS. - Apply peak factor (for example 10x): peak QPS is about
116. - Add safety margin (for example 2x) for bursts and retries: design target about
200-250QPS.