Skip to main content

Rate Limiter

Difficulty: Hard | Frequency: Very High | Patterns: Strategy, Decorator, Singleton


Interview Expectationโ€‹

Rate Limiter is a senior-level problem. The interviewer expects you to:

ExpectationDetails
Know multiple algorithmsToken Bucket, Fixed Window, Sliding Window Log, Sliding Window Counter
Apply Strategy patternSwap algorithms without changing the limiter interface
Handle concurrency correctlyAtomic check-and-decrement, no TOCTOU bugs
Discuss trade-offsMemory vs accuracy, simplicity vs precision
Think about scalePer-user limiting, distributed limiting

Step 1: Clarify Requirementsโ€‹

  • Scope: Per user/IP, or global?
  • Limit: Requests per second? Per minute?
  • Algorithm preference? (if none โ†’ propose Token Bucket with justification)
  • Behavior on limit exceeded: reject (429), queue, or shed load?
  • Distributed? Single node first, mention Redis for distributed

Step 2: Interface Designโ€‹

public interface RateLimiter {
/**
* @return true if the request is allowed, false if rate limited
*/
boolean isAllowed(String clientId);

/**
* Attempt to consume {@code tokens} permits.
*/
boolean tryConsume(String clientId, int tokens);
}

Step 3: Algorithm Implementationsโ€‹

Algorithm 1 โ€” Fixed Window Counterโ€‹

Simple but has boundary burst problem.

public class FixedWindowRateLimiter implements RateLimiter {
private final int limit;
private final long windowSizeMs;

// clientId โ†’ [windowStart, count]
private final ConcurrentHashMap<String, long[]> windows = new ConcurrentHashMap<>();

public FixedWindowRateLimiter(int limit, long windowSizeMs) {
this.limit = limit;
this.windowSizeMs = windowSizeMs;
}

@Override
public boolean isAllowed(String clientId) {
return tryConsume(clientId, 1);
}

@Override
public boolean tryConsume(String clientId, int tokens) {
long now = System.currentTimeMillis();

// Atomic update with compute()
long[] result = windows.compute(clientId, (id, window) -> {
if (window == null || now - window[0] >= windowSizeMs) {
// New or expired window
return new long[]{now, tokens};
}
// Existing window
window[1] += tokens;
return window;
});

return result[1] <= limit;
}
}

/*
* โŒ Flaw: A client can send 100 req at 0:59 and 100 req at 1:01
* โ€” 200 requests in 2 seconds, but both windows show exactly 100.
* This is the "boundary burst" problem.
*/

Smooth rate limiting, allows controlled bursts.

public class TokenBucketRateLimiter implements RateLimiter {
private final int capacity; // Burst capacity
private final double refillRatePerMs; // Tokens added per millisecond

// Per-client bucket state: [tokens, lastRefillTime]
private final ConcurrentHashMap<String, double[]> buckets = new ConcurrentHashMap<>();

public TokenBucketRateLimiter(int capacity, int refillPerSecond) {
this.capacity = capacity;
this.refillRatePerMs = refillPerSecond / 1000.0;
}

@Override
public boolean isAllowed(String clientId) {
return tryConsume(clientId, 1);
}

@Override
public boolean tryConsume(String clientId, int tokens) {
long now = System.currentTimeMillis();

// Atomic compute ensures no TOCTOU race
double[] result = new double[1];
buckets.compute(clientId, (id, bucket) -> {
double currentTokens;
if (bucket == null) {
currentTokens = capacity; // Start full
} else {
long elapsedMs = now - (long) bucket[1];
double refilled = elapsedMs * refillRatePerMs;
currentTokens = Math.min(capacity, bucket[0] + refilled);
}

if (currentTokens >= tokens) {
result[0] = currentTokens - tokens; // allowed
return new double[]{currentTokens - tokens, now};
} else {
result[0] = -1; // denied
return new double[]{currentTokens, now}; // still update timestamp
}
});

return result[0] >= 0;
}
}

Algorithm 3 โ€” Sliding Window Logโ€‹

Most accurate, but O(N) memory per client.

public class SlidingWindowLogRateLimiter implements RateLimiter {
private final int limit;
private final long windowMs;
private final ConcurrentHashMap<String, Deque<Long>> logs = new ConcurrentHashMap<>();

public SlidingWindowLogRateLimiter(int limit, long windowMs) {
this.limit = limit;
this.windowMs = windowMs;
}

@Override
public boolean isAllowed(String clientId) {
return tryConsume(clientId, 1);
}

@Override
public synchronized boolean tryConsume(String clientId, int tokens) {
long now = System.currentTimeMillis();
long windowStart = now - windowMs;

Deque<Long> log = logs.computeIfAbsent(clientId, id -> new ArrayDeque<>());

// Remove expired entries
while (!log.isEmpty() && log.peekFirst() <= windowStart) {
log.pollFirst();
}

if (log.size() + tokens <= limit) {
for (int i = 0; i < tokens; i++) {
log.addLast(now);
}
return true;
}
return false;
}
}

Algorithm 4 โ€” Sliding Window Counter (Space-Efficient)โ€‹

Approximates sliding window using two fixed windows.

public class SlidingWindowCounterRateLimiter implements RateLimiter {
private final int limit;
private final long windowMs;
private final ConcurrentHashMap<String, long[]> windows = new ConcurrentHashMap<>();
// windows[clientId] = [currentWindowStart, currentCount, previousCount]

public SlidingWindowCounterRateLimiter(int limit, long windowMs) {
this.limit = limit;
this.windowMs = windowMs;
}

@Override
public boolean isAllowed(String clientId) {
return tryConsume(clientId, 1);
}

@Override
public boolean tryConsume(String clientId, int tokens) {
long now = System.currentTimeMillis();

long[] window = windows.compute(clientId, (id, w) -> {
if (w == null) return new long[]{now, 0, 0};

long windowStart = w[0];
long elapsed = now - windowStart;

if (elapsed >= 2 * windowMs) {
// Both windows expired
return new long[]{now, 0, 0};
} else if (elapsed >= windowMs) {
// Current window expired โ€” shift windows
return new long[]{windowStart + windowMs, 0, w[1]};
}
return w; // Same window
});

long windowStart = window[0];
double positionInWindow = (double)(now - windowStart) / windowMs;

// Weighted count: previousCount * (1 - position) + currentCount
double weightedCount = window[2] * (1.0 - positionInWindow) + window[1];

if (weightedCount + tokens <= limit) {
window[1] += tokens;
return true;
}
return false;
}
}

Step 4: Strategy Pattern + Factoryโ€‹

public enum RateLimitAlgorithm { FIXED_WINDOW, TOKEN_BUCKET, SLIDING_WINDOW_LOG, SLIDING_WINDOW_COUNTER }

public class RateLimiterFactory {
public static RateLimiter create(RateLimitAlgorithm algorithm, int limit, long windowMs) {
return switch (algorithm) {
case FIXED_WINDOW -> new FixedWindowRateLimiter(limit, windowMs);
case TOKEN_BUCKET -> new TokenBucketRateLimiter(limit, (int)(limit * 1000 / windowMs));
case SLIDING_WINDOW_LOG -> new SlidingWindowLogRateLimiter(limit, windowMs);
case SLIDING_WINDOW_COUNTER -> new SlidingWindowCounterRateLimiter(limit, windowMs);
};
}
}

Step 5: Decorator โ€” Logging & Metricsโ€‹

public class MetricsRateLimiter implements RateLimiter {
private final RateLimiter delegate;
private final AtomicLong allowed = new AtomicLong();
private final AtomicLong rejected = new AtomicLong();

public MetricsRateLimiter(RateLimiter delegate) {
this.delegate = delegate;
}

@Override
public boolean isAllowed(String clientId) {
boolean result = delegate.isAllowed(clientId);
if (result) allowed.incrementAndGet();
else rejected.incrementAndGet();
return result;
}

@Override
public boolean tryConsume(String clientId, int tokens) {
return delegate.tryConsume(clientId, tokens);
}

public double getRejectionRate() {
long total = allowed.get() + rejected.get();
return total == 0 ? 0 : (double) rejected.get() / total;
}
}

Algorithm Trade-offsโ€‹

AlgorithmMemoryAccuracyBurst HandlingBest For
Fixed WindowO(1)/clientLow (boundary bursts)Allows 2x burst at boundarySimple use cases
Token BucketO(1)/clientHighYes, controlled burstAPIs, general use
Sliding Window LogO(N)/clientPerfectNoLow-traffic, strict SLA
Sliding Window CounterO(1)/client~99% accurateNoHigh-traffic approximation

Senior Deep Dive: Distributed Rate Limitingโ€‹

Senior Deep Dive ๐Ÿ”ด

Single node: The above implementations use in-memory state โ€” fast, but state is lost on restart and doesn't work across multiple service instances.

Distributed with Redis (Lua script for atomicity):

-- token_bucket.lua โ€” runs atomically on Redis
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill = tonumber(ARGV[2])
local tokens = tonumber(ARGV[3])
local now = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local current = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now

local elapsed = (now - last) / 1000
local refilled = math.min(capacity, current + elapsed * refill)

if refilled >= tokens then
redis.call('HMSET', key, 'tokens', refilled - tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return 1
else
redis.call('HMSET', key, 'tokens', refilled, 'last_refill', now)
return 0
end

Sticky sessions: Route each client to the same rate limiter node using consistent hashing โ€” avoids distributed state entirely.


Interview Checklistโ€‹

  • Clarified: limit number, window size, per-user vs global
  • Defined clean RateLimiter interface
  • Implemented at least Token Bucket in detail
  • Used ConcurrentHashMap.compute() for atomic check-and-update
  • Applied Strategy pattern for algorithm swappability
  • Compared algorithm trade-offs (memory, accuracy, burst)
  • Mentioned distributed approach (Redis Lua scripts or sticky routing)

See Alsoโ€‹

  • Rate Limiting Algorithms: Deep dive into the mathematical models, workflow details, and comparative trade-offs of all core rate-limiting algorithms.