Common System Design Interview Questions

Each question includes key discussion points, not just the answer. Interviewers want to see your thought process.

Classic System Design Problems

1. Design a URL Shortener (bit.ly)

Key Discussion Points:

Hashing: MD5/SHA256 → base62 encode → take first 7 chars. Handle collisions.
Custom short URLs: Check uniqueness before storing.
Storage: Key-value store (Redis for cache, DB for persistence). ~1.8 TB in 5 years (see Capacity Planning).
Redirect: 301 (permanent, browser caches) vs 302 (temporary, tracks every click). Use 302 for analytics.
Analytics: Click tracking with Kafka → async aggregation.
Read scale: Cache hot URLs in Redis (top 20% = 80% traffic).
Write scale: Low write QPS (~12 writes/sec for 100M DAU), no sharding needed initially.

Key Discussion Points:

Data model: User, Tweet, Follow, Feed tables.
Fan-out strategy: Fan-out-on-write (pre-populate followers' feeds) vs fan-out-on-read.
Celebrity problem: Beyoncé has 50M followers — fan-out-on-write is too slow. Use hybrid: fan-out-on-write for < 1M followers, fan-out-on-read for celebrities.
Timeline storage: Redis sorted set per user (ZADD user:feed:{userId} score tweetId).
Media: S3 + CDN for images/videos.
Search: Elasticsearch for full-text tweet search.
Scale: 100M DAU, ~1,000 write QPS, ~100,000 read QPS → need read replicas + caching.

3. Design YouTube / Video Streaming

Key Discussion Points:

Upload pipeline: Client → API → S3 raw → video processor (Lambda/worker) → encode to multiple resolutions (360p, 720p, 1080p, 4K) → S3 processed.
Video encoding: Transcode to different formats (H.264, AV1) using distributed workers (AWS Elastic Transcoder or custom FFmpeg workers).
Streaming: Adaptive bitrate streaming (HLS/DASH) — client switches quality based on bandwidth.
CDN: Videos served via CDN, not origin. Regional CDN PoPs.
Metadata: Video title, description, tags → Postgres. Views count → Redis INCR → async flush.
Recommendations: ML service, separate from core storage.
Storage estimation: 1M uploads/day × 300MB × 3 resolutions ≈ 900TB/day.

4. Design WhatsApp / Chat System

Key Discussion Points:

Message delivery: Client → WebSocket server → Kafka → recipient WebSocket server.
Message persistence: Store in Cassandra (write-heavy, time-ordered).
Offline delivery: If recipient offline → store in DB → deliver on reconnect.
Message ordering: Sequence number per conversation; Kafka partition per conversation.
End-to-end encryption: Key exchange on first message; server stores encrypted blobs only.
Group chat: Fan-out message to all group members; cap group size for simplicity.
Media: Presigned S3 URL for image/video; send URL in message.
Presence: Redis with TTL (SETEX user:online:{id} 60 1).
Scale: 500M DAU, 40 messages/day = 230,000 msg/s.

Key Discussion Points:

Upload: Presigned S3 URL → direct upload → notify API → async processing (resize, thumbnail, CDN).
Feed: Hybrid fan-out like Twitter. Cache feed in Redis.
Storage: Images on S3 + CloudFront CDN. Metadata in Postgres.
Stories: Separate from main feed. 24h TTL in Redis sorted set.
Explore/Discover: Recommendation engine, separate service.
Counters: Like/view counts in Redis; async flush to DB.

Key Discussion Points:

Location tracking: Drivers send GPS every 5s → WebSocket or HTTP → geospatial store.
Geospatial queries: Redis GEO commands (GEOADD, GEORADIUS) or PostGIS for "drivers near me."
Matching: Trip request → find N nearby drivers → send notification → driver accepts → create trip.
Surge pricing: ML model based on supply/demand per geo-cell.
Trip state machine: REQUESTED → DRIVER_ASSIGNED → DRIVER_EN_ROUTE → TRIP_STARTED → COMPLETED.
Payment: Async, Saga pattern across trip + payment + payout services.
ETA calculation: Road network graph (Dijkstra/A*) with real-time traffic data.

7. Design a Rate Limiter

Key Discussion Points:

Algorithms: Token bucket, sliding window log, sliding window counter.
Distributed implementation: Redis + Lua script for atomic check-and-decrement.
Keys: By API key, user ID, IP, or combination.
Response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Retry-After.
Placement: API Gateway, service middleware, or library.
Multi-tier: Different limits per endpoint (e.g., /login stricter than /feed).

Redis sliding window:

MULTI
  ZADD key timestamp timestamp
  ZREMRANGEBYSCORE key -inf (now - window)
  ZCARD key
EXEC

8. Design a Notification System

Key Discussion Points:

Channels: Push (FCM/APNs), Email (SendGrid/SES), SMS (Twilio), In-app.
Flow: Event → Kafka → Notification Service → channel selection → delivery.
Channel selection: User preferences + event type → choose channel.
Templating: Template engine (Freemarker/Thymeleaf) for message rendering.
Delivery guarantees: At-least-once with idempotency. Track delivery status.
Retry: Exponential backoff for failed deliveries.
Rate limiting: Don't spam users — respect quiet hours, daily limits.
Batching: Email digest (hourly/daily) vs instant push.
Scale: 10M notifications/day = ~116/sec average, needs async pipeline.

9. Design a Web Crawler

Key Discussion Points:

Seed URLs: Start with known URLs, expand via discovered links.
BFS vs DFS: BFS better for breadth; prioritize fresh/important pages.
URL frontier: Priority queue (by importance/freshness) backed by disk.
Distributed crawlers: Partition URLs by domain hash across workers.
Politeness: Respect robots.txt. Rate limit per domain (1 req/sec).
Deduplication: Bloom filter for URL seen check. Content hash for page dedup.
Storage: HTML in S3; metadata/links in Cassandra.
Scale: 1B pages, 500 bytes avg → 500 GB storage. Crawl 1B pages in 30 days = 385 pages/sec.

10. Design a Search Autocomplete

Key Discussion Points:

Trie: Data structure for prefix matching. Too large to fit in memory at Google scale.
Top-K per prefix: Precompute top 10 queries for each prefix.
Data collection: Log search queries → aggregate frequency → build trie offline.
Update frequency: Rebuild trie weekly (offline) or use streaming aggregation.
Storage: Trie in Redis or Elasticsearch. Cache hot prefixes in memory.
Ranking: By frequency, freshness, personalization.
Latency: P99 < 50ms. Local cache for hot prefixes.

11. Design a Distributed Cache (Redis)

Key Discussion Points:

Architecture: Redis Cluster (sharding via consistent hashing across nodes).
Replication: Primary-replica per shard; replica promotes on primary failure.
Eviction: LRU for general cache; LFU for skewed access patterns.
Persistence: RDB (point-in-time snapshot) or AOF (append-only log) or both.
Partitioning: 16,384 hash slots divided across nodes.
Hot key problem: Single key overwhelms one node → replicate hot keys to multiple slots.
Consistency: Redis Cluster is AP — can have split-brain; use WAIT command for durability.

12. Design a Payment System

Key Discussion Points:

Idempotency: Idempotency keys on every payment API call. Essential.
Double-spend prevention: Pessimistic lock or database constraint on account.
Saga: Saga pattern across payment, inventory, fulfillment (see Saga Pattern Guide).
Reconciliation: Async job to compare internal records with payment gateway records.
Compliance: PCI-DSS — never store raw card numbers; use tokens from payment gateway.
Retry logic: Exponential backoff with idempotency keys to payment provider.
Ledger: Append-only ledger table (never update balances directly).
Exactly-once: DB-level idempotency check before processing.

13. Design a Leaderboard

Key Discussion Points:

Redis Sorted Set: ZADD leaderboard score userId. ZREVRANK for position. O(log N).
Global leaderboard: Single sorted set — works up to 100M+ entries in Redis.
Friends leaderboard: Intersect global sorted set with user's friend set.
Time windows: Separate sorted sets for daily/weekly/all-time. Expire daily set after 24h.
Score updates: ZINCRBY leaderboard delta userId — atomic increment.
Pagination: ZREVRANGE leaderboard 0 9 WITHSCORES for top 10.
Scale: Redis can handle 100K+ ZADD operations/sec.

14. Design a Key-Value Store

Key Discussion Points:

Data structures: Hash table for O(1) get/put. LSM-tree for disk-based (LevelDB, RocksDB).
Consistent hashing: Distribute keys across nodes. Virtual nodes for even distribution.
Replication: 3 replicas per key (configurable N). Quorum reads/writes (W + R > N).
Conflict resolution: Last-write-wins (timestamp) or vector clocks (causal).
Partitioning: Consistent hashing ring. Adding nodes → minimal data migration.
Gossip protocol: Node membership and failure detection.
Anti-entropy: Merkle trees for detecting and repairing inconsistencies between replicas.

Behavioral / Architecture Deep-Dive Questions

Trade-off Questions

When would you use a relational DB vs NoSQL? What's your decision framework?
When would you choose microservices over a monolith?
When is eventual consistency acceptable? When is it not?
How do you decide between synchronous and asynchronous communication?
When would you use a message queue vs direct API calls?

Operational Questions

How do you deploy a schema migration with zero downtime?
How do you debug a sudden latency spike in production?
How do you design a system with 99.99% availability?
How do you handle a database that's running out of disk space?
How do you approach capacity planning for a new feature?

Architecture Questions

How would you evolve a monolith into microservices incrementally?
How do you design for multi-tenancy?
How would you design a system that must work offline?
How do you design for GDPR compliance (data deletion, data portability)?
How would you design a geo-distributed system that serves users globally?

Interview Tips Summary

Phase	What Interviewers Look For
Requirements	Do you ask the right questions? Do you scope properly?
Estimation	Can you do back-of-envelope math? Do you validate assumptions?
High-level design	Is the design sound? Does it address the requirements?
Deep dive	Can you go deep on specific components? Do you know trade-offs?
Wrap-up	Do you identify weaknesses? Do you know what to monitor?

Red Flags to Avoid

Jumping to solutions without requirements
Designing a single-server system
Not acknowledging trade-offs
Silence — always think out loud
Designing perfect system before validating basics
Ignoring failure modes

Green Flags to Show

Explicit assumptions
Quantitative reasoning ("at 10,000 QPS, a single DB can handle this...")
Trade-off awareness ("I'm choosing X over Y because...")
Proactive failure mode discussion
Incrementally evolving the design

Quick Reference: Technology Selection

Need	Technology
Relational data, ACID	PostgreSQL
Document store	MongoDB
Key-value / cache	Redis
Column-family, time-series	Cassandra
Full-text search	Elasticsearch
Message queue	Kafka (streaming) / RabbitMQ (task queue)
Object storage	S3 / GCS
CDN	CloudFront / Fastly
Service mesh	Istio / Linkerd
Container orchestration	Kubernetes
Distributed lock	Redis / Zookeeper
API Gateway	Kong / AWS API Gateway / Spring Cloud Gateway

Classic System Design Problems​

1. Design a URL Shortener (bit.ly)​

2. Design Twitter / Social Feed​

3. Design YouTube / Video Streaming​

4. Design WhatsApp / Chat System​

5. Design Instagram / Photo Sharing​

6. Design Uber / Ride Sharing​

7. Design a Rate Limiter​

8. Design a Notification System​

9. Design a Web Crawler​

10. Design a Search Autocomplete​

11. Design a Distributed Cache (Redis)​

12. Design a Payment System​

13. Design a Leaderboard​

14. Design a Key-Value Store​

Behavioral / Architecture Deep-Dive Questions​

Trade-off Questions​

Operational Questions​

Architecture Questions​

Interview Tips Summary​

Red Flags to Avoid​

Green Flags to Show​

Quick Reference: Technology Selection​

Classic System Design Problems

1. Design a URL Shortener (bit.ly)

2. Design Twitter / Social Feed

3. Design YouTube / Video Streaming

4. Design WhatsApp / Chat System

5. Design Instagram / Photo Sharing

6. Design Uber / Ride Sharing

7. Design a Rate Limiter

8. Design a Notification System

9. Design a Web Crawler

10. Design a Search Autocomplete

11. Design a Distributed Cache (Redis)

12. Design a Payment System

13. Design a Leaderboard

14. Design a Key-Value Store

Behavioral / Architecture Deep-Dive Questions

Trade-off Questions

Operational Questions

Architecture Questions

Interview Tips Summary

Red Flags to Avoid

Green Flags to Show

Quick Reference: Technology Selection