Skip to main content

Common System Design Interview Questions

Each question includes key discussion points, not just the answer. Interviewers want to see your thought process.


Classic System Design Problems​

1. Design a URL Shortener (bit.ly)​

Key Discussion Points:

  • Hashing: MD5/SHA256 β†’ base62 encode β†’ take first 7 chars. Handle collisions.
  • Custom short URLs: Check uniqueness before storing.
  • Storage: Key-value store (Redis for cache, DB for persistence). ~1.8 TB in 5 years (see Capacity Planning).
  • Redirect: 301 (permanent, browser caches) vs 302 (temporary, tracks every click). Use 302 for analytics.
  • Analytics: Click tracking with Kafka β†’ async aggregation.
  • Read scale: Cache hot URLs in Redis (top 20% = 80% traffic).
  • Write scale: Low write QPS (~12 writes/sec for 100M DAU), no sharding needed initially.

2. Design Twitter / Social Feed​

Key Discussion Points:

  • Data model: User, Tweet, Follow, Feed tables.
  • Fan-out strategy: Fan-out-on-write (pre-populate followers' feeds) vs fan-out-on-read.
  • Celebrity problem: BeyoncΓ© has 50M followers β€” fan-out-on-write is too slow. Use hybrid: fan-out-on-write for < 1M followers, fan-out-on-read for celebrities.
  • Timeline storage: Redis sorted set per user (ZADD user:feed:{userId} score tweetId).
  • Media: S3 + CDN for images/videos.
  • Search: Elasticsearch for full-text tweet search.
  • Scale: 100M DAU, ~1,000 write QPS, ~100,000 read QPS β†’ need read replicas + caching.

3. Design YouTube / Video Streaming​

Key Discussion Points:

  • Upload pipeline: Client β†’ API β†’ S3 raw β†’ video processor (Lambda/worker) β†’ encode to multiple resolutions (360p, 720p, 1080p, 4K) β†’ S3 processed.
  • Video encoding: Transcode to different formats (H.264, AV1) using distributed workers (AWS Elastic Transcoder or custom FFmpeg workers).
  • Streaming: Adaptive bitrate streaming (HLS/DASH) β€” client switches quality based on bandwidth.
  • CDN: Videos served via CDN, not origin. Regional CDN PoPs.
  • Metadata: Video title, description, tags β†’ Postgres. Views count β†’ Redis INCR β†’ async flush.
  • Recommendations: ML service, separate from core storage.
  • Storage estimation: 1M uploads/day Γ— 300MB Γ— 3 resolutions β‰ˆ 900TB/day.

4. Design WhatsApp / Chat System​

Key Discussion Points:

  • Message delivery: Client β†’ WebSocket server β†’ Kafka β†’ recipient WebSocket server.
  • Message persistence: Store in Cassandra (write-heavy, time-ordered).
  • Offline delivery: If recipient offline β†’ store in DB β†’ deliver on reconnect.
  • Message ordering: Sequence number per conversation; Kafka partition per conversation.
  • End-to-end encryption: Key exchange on first message; server stores encrypted blobs only.
  • Group chat: Fan-out message to all group members; cap group size for simplicity.
  • Media: Presigned S3 URL for image/video; send URL in message.
  • Presence: Redis with TTL (SETEX user:online:{id} 60 1).
  • Scale: 500M DAU, 40 messages/day = 230,000 msg/s.

5. Design Instagram / Photo Sharing​

Key Discussion Points:

  • Upload: Presigned S3 URL β†’ direct upload β†’ notify API β†’ async processing (resize, thumbnail, CDN).
  • Feed: Hybrid fan-out like Twitter. Cache feed in Redis.
  • Storage: Images on S3 + CloudFront CDN. Metadata in Postgres.
  • Stories: Separate from main feed. 24h TTL in Redis sorted set.
  • Explore/Discover: Recommendation engine, separate service.
  • Counters: Like/view counts in Redis; async flush to DB.

6. Design Uber / Ride Sharing​

Key Discussion Points:

  • Location tracking: Drivers send GPS every 5s β†’ WebSocket or HTTP β†’ geospatial store.
  • Geospatial queries: Redis GEO commands (GEOADD, GEORADIUS) or PostGIS for "drivers near me."
  • Matching: Trip request β†’ find N nearby drivers β†’ send notification β†’ driver accepts β†’ create trip.
  • Surge pricing: ML model based on supply/demand per geo-cell.
  • Trip state machine: REQUESTED β†’ DRIVER_ASSIGNED β†’ DRIVER_EN_ROUTE β†’ TRIP_STARTED β†’ COMPLETED.
  • Payment: Async, Saga pattern across trip + payment + payout services.
  • ETA calculation: Road network graph (Dijkstra/A*) with real-time traffic data.

7. Design a Rate Limiter​

Key Discussion Points:

  • Algorithms: Token bucket, sliding window log, sliding window counter.
  • Distributed implementation: Redis + Lua script for atomic check-and-decrement.
  • Keys: By API key, user ID, IP, or combination.
  • Response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Retry-After.
  • Placement: API Gateway, service middleware, or library.
  • Multi-tier: Different limits per endpoint (e.g., /login stricter than /feed).
  • Redis sliding window:
    MULTI
    ZADD key timestamp timestamp
    ZREMRANGEBYSCORE key -inf (now - window)
    ZCARD key
    EXEC

8. Design a Notification System​

Key Discussion Points:

  • Channels: Push (FCM/APNs), Email (SendGrid/SES), SMS (Twilio), In-app.
  • Flow: Event β†’ Kafka β†’ Notification Service β†’ channel selection β†’ delivery.
  • Channel selection: User preferences + event type β†’ choose channel.
  • Templating: Template engine (Freemarker/Thymeleaf) for message rendering.
  • Delivery guarantees: At-least-once with idempotency. Track delivery status.
  • Retry: Exponential backoff for failed deliveries.
  • Rate limiting: Don't spam users β€” respect quiet hours, daily limits.
  • Batching: Email digest (hourly/daily) vs instant push.
  • Scale: 10M notifications/day = ~116/sec average, needs async pipeline.

9. Design a Web Crawler​

Key Discussion Points:

  • Seed URLs: Start with known URLs, expand via discovered links.
  • BFS vs DFS: BFS better for breadth; prioritize fresh/important pages.
  • URL frontier: Priority queue (by importance/freshness) backed by disk.
  • Distributed crawlers: Partition URLs by domain hash across workers.
  • Politeness: Respect robots.txt. Rate limit per domain (1 req/sec).
  • Deduplication: Bloom filter for URL seen check. Content hash for page dedup.
  • Storage: HTML in S3; metadata/links in Cassandra.
  • Scale: 1B pages, 500 bytes avg β†’ 500 GB storage. Crawl 1B pages in 30 days = 385 pages/sec.

10. Design a Search Autocomplete​

Key Discussion Points:

  • Trie: Data structure for prefix matching. Too large to fit in memory at Google scale.
  • Top-K per prefix: Precompute top 10 queries for each prefix.
  • Data collection: Log search queries β†’ aggregate frequency β†’ build trie offline.
  • Update frequency: Rebuild trie weekly (offline) or use streaming aggregation.
  • Storage: Trie in Redis or Elasticsearch. Cache hot prefixes in memory.
  • Ranking: By frequency, freshness, personalization.
  • Latency: P99 < 50ms. Local cache for hot prefixes.

11. Design a Distributed Cache (Redis)​

Key Discussion Points:

  • Architecture: Redis Cluster (sharding via consistent hashing across nodes).
  • Replication: Primary-replica per shard; replica promotes on primary failure.
  • Eviction: LRU for general cache; LFU for skewed access patterns.
  • Persistence: RDB (point-in-time snapshot) or AOF (append-only log) or both.
  • Partitioning: 16,384 hash slots divided across nodes.
  • Hot key problem: Single key overwhelms one node β†’ replicate hot keys to multiple slots.
  • Consistency: Redis Cluster is AP β€” can have split-brain; use WAIT command for durability.

12. Design a Payment System​

Key Discussion Points:

  • Idempotency: Idempotency keys on every payment API call. Essential.
  • Double-spend prevention: Pessimistic lock or database constraint on account.
  • Saga: Saga pattern across payment, inventory, fulfillment (see Saga Pattern Guide).
  • Reconciliation: Async job to compare internal records with payment gateway records.
  • Compliance: PCI-DSS β€” never store raw card numbers; use tokens from payment gateway.
  • Retry logic: Exponential backoff with idempotency keys to payment provider.
  • Ledger: Append-only ledger table (never update balances directly).
  • Exactly-once: DB-level idempotency check before processing.

13. Design a Leaderboard​

Key Discussion Points:

  • Redis Sorted Set: ZADD leaderboard score userId. ZREVRANK for position. O(log N).
  • Global leaderboard: Single sorted set β€” works up to 100M+ entries in Redis.
  • Friends leaderboard: Intersect global sorted set with user's friend set.
  • Time windows: Separate sorted sets for daily/weekly/all-time. Expire daily set after 24h.
  • Score updates: ZINCRBY leaderboard delta userId β€” atomic increment.
  • Pagination: ZREVRANGE leaderboard 0 9 WITHSCORES for top 10.
  • Scale: Redis can handle 100K+ ZADD operations/sec.

14. Design a Key-Value Store​

Key Discussion Points:

  • Data structures: Hash table for O(1) get/put. LSM-tree for disk-based (LevelDB, RocksDB).
  • Consistent hashing: Distribute keys across nodes. Virtual nodes for even distribution.
  • Replication: 3 replicas per key (configurable N). Quorum reads/writes (W + R > N).
  • Conflict resolution: Last-write-wins (timestamp) or vector clocks (causal).
  • Partitioning: Consistent hashing ring. Adding nodes β†’ minimal data migration.
  • Gossip protocol: Node membership and failure detection.
  • Anti-entropy: Merkle trees for detecting and repairing inconsistencies between replicas.

Behavioral / Architecture Deep-Dive Questions​

Trade-off Questions​

  1. When would you use a relational DB vs NoSQL? What's your decision framework?
  2. When would you choose microservices over a monolith?
  3. When is eventual consistency acceptable? When is it not?
  4. How do you decide between synchronous and asynchronous communication?
  5. When would you use a message queue vs direct API calls?

Operational Questions​

  1. How do you deploy a schema migration with zero downtime?
  2. How do you debug a sudden latency spike in production?
  3. How do you design a system with 99.99% availability?
  4. How do you handle a database that's running out of disk space?
  5. How do you approach capacity planning for a new feature?

Architecture Questions​

  1. How would you evolve a monolith into microservices incrementally?
  2. How do you design for multi-tenancy?
  3. How would you design a system that must work offline?
  4. How do you design for GDPR compliance (data deletion, data portability)?
  5. How would you design a geo-distributed system that serves users globally?

Interview Tips Summary​

PhaseWhat Interviewers Look For
RequirementsDo you ask the right questions? Do you scope properly?
EstimationCan you do back-of-envelope math? Do you validate assumptions?
High-level designIs the design sound? Does it address the requirements?
Deep diveCan you go deep on specific components? Do you know trade-offs?
Wrap-upDo you identify weaknesses? Do you know what to monitor?

Red Flags to Avoid​

  • Jumping to solutions without requirements
  • Designing a single-server system
  • Not acknowledging trade-offs
  • Silence β€” always think out loud
  • Designing perfect system before validating basics
  • Ignoring failure modes

Green Flags to Show​

  • Explicit assumptions
  • Quantitative reasoning ("at 10,000 QPS, a single DB can handle this...")
  • Trade-off awareness ("I'm choosing X over Y because...")
  • Proactive failure mode discussion
  • Incrementally evolving the design

Quick Reference: Technology Selection​

NeedTechnology
Relational data, ACIDPostgreSQL
Document storeMongoDB
Key-value / cacheRedis
Column-family, time-seriesCassandra
Full-text searchElasticsearch
Message queueKafka (streaming) / RabbitMQ (task queue)
Object storageS3 / GCS
CDNCloudFront / Fastly
Service meshIstio / Linkerd
Container orchestrationKubernetes
Distributed lockRedis / Zookeeper
API GatewayKong / AWS API Gateway / Spring Cloud Gateway