Bloom Filters
A Bloom filter is a space-efficient, probabilistic data structure for testing whether an element is a member of a set. It trades memory efficiency for a small false positive rate (but zero false negatives).
๐ Table of Contentsโ
- Problem Bloom Filters Solve
- How Bloom Filters Work
- Java Implementation
- Real-World Use Cases
- Space Complexity Analysis
- Tradeoffs & Limitations
- Variants
- How Bloom Filters Work Internally
- Integration Patterns
- Pros and Cons
- Interview Questions
- Senior Deep Dive: Advanced Topics
๐ฏ 1. Problem Bloom Filters Solveโ
The Challengeโ
In many systems, we need to quickly determine whether an element exists in a large set without storing the entire set in memory.
Without Bloom filters:
Check if user_id exists in database?
โ Query DB (~100ms) โ High latency
โ Query cache โ Cache miss on new users
โ Query DB anyway โ Wasted resources
With Bloom filters:
Check if user_id might exist?
โ Bloom filter (~1ฮผs) โ "Definitely not" or "Maybe"
โ If "Maybe" โ Query DB (amortized lower DB load)
โ Zero false negatives โ Never miss real data
Real-World Scenariosโ
Scenario 1: Web Crawler URL Deduplication
- Problem: Crawler encounters billions of URLs
- Solution: Bloom filter to track visited URLs
- Benefit: 95% memory reduction vs. HashSet
Scenario 2: Database Query Optimization
- Problem: Frequent queries for non-existent keys
- Solution: Bloom filter to filter out non-existent keys
- Benefit: 80% reduction in database load
Scenario 3: Password Breach Detection
- Problem: Check if password is in known breach list
- Solution: Bloom filter for quick pre-check
- Benefit: Fast response without storing actual passwords
Performance Comparisonโ
| Operation | HashSet | Bloom Filter | Improvement |
|---|---|---|---|
| Memory (1M elements) | ~32 MB | ~1.2 MB | 96% reduction |
| Lookup Time | O(1) avg | O(k) hashing | Similar |
| False Negatives | 0 | 0 | Same |
| False Positives | 0 | 1% (configurable) | Trade-off |
โ๏ธ 2. How Bloom Filters Workโ
Core Conceptโ
A Bloom filter uses k independent hash functions to map elements into m bit positions in a bit array.
Insertion:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bit Array (initially all 0s) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ [0][0][0][1][0][1][0][1][0][0][...]โ
โ โ โ โ
โ hash1 hash2 hash3
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Insert "user_123":
hash1("user_123") % m = 3 โ set bit[3] = 1
hash2("user_123") % m = 5 โ set bit[5] = 1
hash3("user_123") % m = 7 โ set bit[7] = 1
Lookup Processโ
Lookup "user_456":
hash1("user_456") % m = 3 โ bit[3] = 1 โ
hash2("user_456") % m = 5 โ bit[5] = 0 โ
Result: "Definitely NOT in set" (false positive impossible)
Lookup "user_123":
hash1("user_123") % m = 3 โ bit[3] = 1 โ
hash2("user_123") % m = 5 โ bit[5] = 1 โ
hash3("user_123") % m = 7 โ bit[7] = 1 โ
Result: "Probably in set" (but small chance of false positive)
Visual Representationโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bloom Filter Structure โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Bit Array (m bits): โ
โ โโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโ โ
โ โ 0 โ 0 โ 1 โ 0 โ 1 โ 0 โ 1 โ 0 โ 0 โ 1 โ 0 โ 0 โ ... โ โ
โ โโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโ โ
โ 0 1 2 3 4 5 6 7 8 9 10 11 โ
โ โ
โ Hash Functions (k = 3): โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ hash1(x) = murmur3_32(x) % m โ โ
โ โ hash2(x) = murmur3_32(x + seed1) % m โ โ
โ โ hash3(x) = murmur3_32(x + seed2) % m โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Insert "apple": โ
โ hash1("apple") = 2 โ set bit[2] = 1 โ
โ hash2("apple") = 4 โ set bit[4] = 1 โ
โ hash3("apple") = 6 โ set bit[6] = 1 โ
โ โ
โ Lookup "banana": โ
โ hash1("banana") = 3 โ bit[3] = 0 โ โ
โ Result: Definitely NOT in set โ
โ โ
โ Lookup "apple": โ
โ hash1("apple") = 2 โ bit[2] = 1 โ โ
โ hash2("apple") = 4 โ bit[4] = 1 โ โ
โ hash3("apple") = 6 โ bit[6] = 1 โ โ
โ Result: Probably in set (may be false positive) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Mathematical Propertiesโ
False Positive Probability:
The probability of a false positive is approximately:
P(false\_positive) \approx (1 - e^{-kn/m})^k
Where:
k= number of hash functionsn= number of elements insertedm= size of bit array
Optimal Number of Hash Functions:
k = \frac{m}{n} \ln(2) \approx 0.693 \frac{m}{n}
Optimal Bit Array Size:
m = -\frac{n \ln(p)}{(\ln(2))^2} \approx -1.44n \log_2(p)
Where p is the desired false positive rate.
๐ป 3. Java Implementationโ
Using Guava Library (Recommended)โ
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import java.nio.charset.StandardCharsets;
@Service
public class UserBloomFilterService {
// Create a Bloom filter for 100k users with 1% false positive rate
private final BloomFilter<Long> userIdFilter = BloomFilter.create(
Funnels.longFunnel(), // Hash funnel for Long
100_000, // Expected insertions
0.01 // False positive probability (1%)
);
public void initializeFilter(List<Long> userIds) {
userIds.forEach(userIdFilter::put);
}
// Check before expensive DB query
public User getUserIfExists(Long userId) {
if (!userIdFilter.mightContain(userId)) {
// Definitely not in DB โ early return
return null;
}
// Might be in DB โ check DB
return userRepository.findById(userId).orElse(null);
}
// Add new user to filter
public void addUser(Long userId) {
userIdFilter.put(userId);
}
}
String Bloom Filter with Guavaโ
@Service
public class UrlBloomFilterService {
private final BloomFilter<String> urlFilter = BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
10_000_000, // 10 million URLs
0.001 // 0.1% false positive rate
);
public boolean isUrlCrawled(String url) {
if (!urlFilter.mightContain(url)) {
return false; // Definitely not crawled
}
// Might be crawled (could be false positive)
return true;
}
public void markUrlAsCrawled(String url) {
urlFilter.put(url);
}
}
Custom Implementation with Apache Commonsโ
import org.apache.commons.collections4.BloomFilter;
import org.apache.commons.collections4.functors.ByteTransformer;
public class BasicBloomFilter<T> {
private final int size;
private final byte[] bitArray;
private final int hashCount;
public BasicBloomFilter(int expectedElements, double falsePositiveRate) {
// Calculate optimal bit array size
this.size = optimalBitArraySize(expectedElements, falsePositiveRate);
// Calculate optimal number of hash functions
this.hashCount = optimalHashCount(size, expectedElements);
this.bitArray = new byte[(size + 7) / 8];
}
public void add(T element) {
for (int i = 0; i < hashCount; i++) {
int hashCode = hash(element, i);
int position = Math.abs(hashCode % size);
setBit(position);
}
}
public boolean mightContain(T element) {
for (int i = 0; i < hashCount; i++) {
int hashCode = hash(element, i);
int position = Math.abs(hashCode % size);
if (!isBitSet(position)) {
return false; // Definitely not present
}
}
return true; // Probably present (or false positive)
}
private int hash(T element, int seed) {
return element.hashCode() ^ seed * 31;
}
private void setBit(int position) {
int byteIndex = position / 8;
int bitIndex = position % 8;
bitArray[byteIndex] |= (byte)(1 << bitIndex);
}
private boolean isBitSet(int position) {
int byteIndex = position / 8;
int bitIndex = position % 8;
return (bitArray[byteIndex] & (1 << bitIndex)) != 0;
}
private static int optimalBitArraySize(int n, double p) {
// m = -(n * ln(p)) / (ln(2)^2)
return (int)(-n * Math.log(p) / (Math.log(2) * Math.log(2)));
}
private static int optimalHashCount(int m, int n) {
// k = (m / n) * ln(2)
return Math.max(1, (int)(m / n * Math.log(2)));
}
}
Python Implementationโ
import hashlib
import math
class BloomFilter:
def __init__(self, expected_elements, false_positive_rate=0.01):
self.size = self._calculate_size(expected_elements, false_positive_rate)
self.hash_count = self._calculate_hash_count(self.size, expected_elements)
self.bit_array = [0] * self.size
def _calculate_size(self, n, p):
"""Calculate optimal bit array size"""
return int(-n * math.log(p) / (math.log(2) ** 2))
def _calculate_hash_count(self, m, n):
"""Calculate optimal number of hash functions"""
return int(m / n * math.log(2))
def _hash(self, item, seed):
"""Generate hash value with seed"""
item_str = f"{item}{seed}".encode('utf-8')
hash_value = hashlib.md5(item_str).hexdigest()
return int(hash_value, 16)
def add(self, item):
"""Add item to bloom filter"""
for i in range(self.hash_count):
position = self._hash(item, i) % self.size
self.bit_array[position] = 1
def might_contain(self, item):
"""Check if item might be in bloom filter"""
for i in range(self.hash_count):
position = self._hash(item, i) % self.size
if self.bit_array[position] == 0:
return False # Definitely not present
return True # Probably present
# Usage
bloom = BloomFilter(expected_elements=100000, false_positive_rate=0.01)
bloom.add("user_123")
print(bloom.might_contain("user_123")) # True
print(bloom.might_contain("user_456")) # False (probably)
๐ 4. Real-World Use Casesโ
1. Cache Penetration Preventionโ
// Prevent repeated lookups for non-existent users
@Service
public class CacheService {
private final BloomFilter<Long> deletedUserIds = BloomFilter.create(
Funnels.longFunnel(), 10_000_000, 0.001
);
public User getUser(Long userId) {
// Skip DB entirely for known-deleted users
if (deletedUserIds.mightContain(userId)) {
return null;
}
return cache.getOrFetch(userId, () -> db.findUser(userId));
}
@Transactional
public void deleteUser(Long userId) {
db.delete(userId);
deletedUserIds.put(userId); // Add to filter
}
}
2. Duplicate Prevention in Stream Processingโ
// Kafka consumer: prevent processing duplicate events
@Service
public class EventProcessor {
private final BloomFilter<String> processedEvents = BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
1_000_000,
0.0001 // 0.01% false positive rate for critical deduplication
);
@KafkaListener(topics = "events")
public void processEvent(Event event) {
if (processedEvents.mightContain(event.getId())) {
log.warn("Duplicate event: {}", event.getId());
return; // Skip if already processed
}
// Safe to process
service.handle(event);
processedEvents.put(event.getId());
}
}
3. URL Deduplication in Web Crawlersโ
// HyperLogLog + Bloom filters in URL deduplication
@Service
public class WebCrawler {
private final BloomFilter<String> crawledUrls = BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
100_000_000, // 100M URLs
0.0001 // 0.01% false positive
);
public void crawlPage(String url) {
if (crawledUrls.mightContain(url)) {
return; // Already crawled
}
// Fetch and parse
Page page = fetchPage(url);
processPage(page);
// Mark as processed
crawledUrls.put(url);
}
}
4. User ID Existence Checkโ
// LinkedIn-like: bulk operations checking which users exist
@Service
public class UserService {
private final BloomFilter<Long> activeUserIds;
@PostConstruct
public void loadUserIds() {
List<Long> allUserIds = userRepository.findAllIds();
this.activeUserIds = BloomFilter.create(
Funnels.longFunnel(),
allUserIds.size(),
0.001 // 0.1% false positive
);
allUserIds.forEach(activeUserIds::put);
}
public List<Long> filterExistingUsers(List<Long> userIds) {
return userIds.stream()
.filter(activeUserIds::mightContain)
.filter(id -> userRepository.existsById(id)) // Double-check
.collect(Collectors.toList());
}
}
5. Password Breach Detectionโ
// Check if password is in known breach list
@Service
public class PasswordSecurityService {
private final BloomFilter<String> breachedPasswords;
@PostConstruct
public void loadBreachedPasswords() {
// Load from HaveIBeenPwned or similar service
List<String> passwords = loadBreachedPasswordHashes();
this.breachedPasswords = BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
passwords.size(),
0.0001 // Very low false positive rate
);
passwords.forEach(breachedPasswords::put);
}
public boolean isPasswordBreached(String passwordHash) {
if (!breachedPasswords.mightContain(passwordHash)) {
return false; // Definitely not breached
}
// Double-check with actual database
return passwordDatabase.contains(passwordHash);
}
}
6. Content Recommendation Systemโ
// Track seen content to avoid recommendations
@Service
public class RecommendationService {
private final Map<Long, BloomFilter<String>> userSeenContent = new ConcurrentHashMap<>();
public List<Content> getRecommendations(Long userId) {
BloomFilter<String> seenFilter = userSeenContent.computeIfAbsent(
userId,
id -> BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
10000,
0.01
)
);
return contentRepository.findAll().stream()
.filter(content -> !seenFilter.mightContain(content.getId()))
.limit(10)
.collect(Collectors.toList());
}
public void markAsSeen(Long userId, String contentId) {
BloomFilter<String> filter = userSeenContent.get(userId);
if (filter != null) {
filter.put(contentId);
}
}
}
๐ 5. Space Complexity Analysisโ
Bloom Filter vs Alternativesโ
| Data Structure | Space (1M elements) | Lookup | False Negatives | False Positives |
|---|---|---|---|---|
HashSet<Long> | ~32 MB (8 bytes ร 1M + overhead) | O(1) avg | 0 | 0 |
| Bloom Filter | ~1.2 MB (with 1% FP rate) | O(k) hashing | 0 | 1% (configurable) |
TreeSet<Long> | ~40 MB (per-node overhead) | O(log n) | 0 | 0 |
| BitSet | ~125 KB (1M bits) | O(1) | 0 | High (collisions) |
Savings: Bloom filter uses 96% less space than HashSet with under 1% false positive rate.
Memory Usage Examplesโ
Example 1: 1 Million Elements
HashSet: 32 MB
Bloom Filter (1% FP): 1.2 MB
Savings: 30.8 MB (96%)
Example 2: 10 Million Elements
HashSet: 320 MB
Bloom Filter (0.1% FP): 19.2 MB
Savings: 300.8 MB (94%)
Example 3: 100 Million Elements
HashSet: 3.2 GB
Bloom Filter (0.01% FP): 288 MB
Savings: 2.9 GB (91%)
Formula for Optimal Parametersโ
Given:
n = expected number of elements
p = desired false positive rate
Optimal bit array size:
m = -(n * ln(p)) / (ln(2)ยฒ) โ 1.44 * n * logโ(1/p) bits
Optimal number of hash functions:
k = (m / n) * ln(2)
Example: n=1,000,000, p=0.01 (1%)
m = -(1M * ln(0.01)) / 0.48 โ 9.6 million bits โ 1.2 MB
k = (9.6M / 1M) * 0.693 โ 7 hash functions
False Positive Rate vs. Size Trade-offโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ False Positive Rate vs. Memory Usage โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ FP Rate โ Memory (1M elements) โ Hash Functions โ
โ โโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ โ
โ 10% โ 0.6 MB โ 4 โ โ
โ 5% โ 0.9 MB โ 5 โ โ
โ 1% โ 1.2 MB โ 7 โ โ
โ 0.1% โ 1.8 MB โ 10 โ โ
โ 0.01% โ 2.4 MB โ 13 โ โ
โ โ
โ Trade-off: Lower FP rate requires more memory and more โ
โ hash functions (slower lookups) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ๏ธ 6. Tradeoffs & Limitationsโ
Advantagesโ
โ Space-efficient: 1-2 bits per element vs 64+ bits per element in HashSet โ O(1) lookup: Constant time regardless of set size โ Zero false negatives: "Definitely not" is always correct โ Predictable memory: No dynamic resizing โ Fast operations: Hash computation is very fast โ No locking: Can be used in concurrent scenarios โ Simple implementation: Easy to understand and implement
Disadvantagesโ
โ False positives: Cannot be eliminated, only controlled โ No deletion: Standard Bloom filters don't support efficient removal โ Counting Bloom filters: Support deletion but use more space โ Hash collision risk: Quality of hash functions affects accuracy โ Fixed size: Must know expected number of elements in advance โ No ordering: Cannot iterate over elements โ Limited operations: Only supports add and contains operations
When to Use Bloom Filtersโ
Use Bloom Filters when:
- Memory is constrained and you can tolerate false positives
- You need fast membership testing
- You can verify false positives with a secondary data structure
- The set is large and static (or slowly changing)
- False negatives are unacceptable
Avoid Bloom Filters when:
- You need exact membership testing
- You need to support deletions frequently
- The set is very small (overhead not worth it)
- You need to iterate over elements
- False positives are unacceptable
๐ 7. Variantsโ
Counting Bloom Filter (with deletions)โ
// Supports add() and remove() operations
public class CountingBloomFilter<T> {
private final int[] counts; // Use 4-bit counters instead of bits
private final int size;
private final int hashCount;
public CountingBloomFilter(int expectedElements, double falsePositiveRate) {
this.size = optimalBitArraySize(expectedElements, falsePositiveRate);
this.hashCount = optimalHashCount(size, expectedElements);
this.counts = new int[size];
}
public void put(T element) {
for (int i = 0; i < hashCount; i++) {
int position = hash(element, i) % size;
counts[position]++; // Increment counter
}
}
public void remove(T element) {
for (int i = 0; i < hashCount; i++) {
int position = hash(element, i) % size;
if (counts[position] > 0) {
counts[position]--; // Decrement counter
}
}
}
public boolean mightContain(T element) {
for (int i = 0; i < hashCount; i++) {
int position = hash(element, i) % size;
if (counts[position] == 0) {
return false;
}
}
return true;
}
}
Trade-offs:
- Pros: Supports deletions
- Cons: Uses more memory (4-8 bits per position vs. 1 bit)
Scalable Bloom Filterโ
// Dynamically grows without full rebuild
public class ScalableBloomFilter<T> {
private final List<BloomFilter<T>> filters = new ArrayList<>();
private final double scalingFactor = 2.0;
private final double falsePositiveRate;
public ScalableBloomFilter(double falsePositiveRate) {
this.falsePositiveRate = falsePositiveRate;
}
public void add(T element) {
if (filters.isEmpty() || !getCurrentFilter().mightContain(element)) {
// Create new larger Bloom filter
addNewFilter();
getCurrentFilter().put(element);
}
}
public boolean mightContain(T element) {
return filters.stream()
.anyMatch(filter -> filter.mightContain(element));
}
private BloomFilter<T> getCurrentFilter() {
return filters.get(filters.size() - 1);
}
private void addNewFilter() {
int size = filters.isEmpty() ? 1000 :
(int)(getCurrentFilter().size() * scalingFactor);
double fpRate = filters.isEmpty() ? falsePositiveRate :
getCurrentFilter().falsePositiveRate() * 0.9;
filters.add(new BloomFilter<>(size, fpRate));
}
}
Trade-offs:
- Pros: Handles unknown or growing datasets
- Cons: Slightly higher memory usage
Bloom Filter with Expirationโ
// Supports time-based expiration
public class ExpiringBloomFilter<T> {
private final BloomFilter<T> filter;
private final Map<T, Long> timestamps = new ConcurrentHashMap<>();
private final long ttl; // Time to live in milliseconds
public ExpiringBloomFilter(int expectedElements, double falsePositiveRate, long ttl) {
this.filter = new BloomFilter<>(expectedElements, falsePositiveRate);
this.ttl = ttl;
}
public void add(T element) {
filter.put(element);
timestamps.put(element, System.currentTimeMillis());
}
public boolean mightContain(T element) {
if (!filter.mightContain(element)) {
return false;
}
// Check if expired
Long timestamp = timestamps.get(element);
if (timestamp == null || System.currentTimeMillis() - timestamp > ttl) {
return false;
}
return true;
}
public void cleanup() {
long now = System.currentTimeMillis();
timestamps.entrySet().removeIf(entry ->
now - entry.getValue() > ttl);
}
}
Distributed Bloom Filterโ
// Bloom filter across multiple nodes
public class DistributedBloomFilter<T> {
private final List<BloomFilter<T>> filters;
private final int replicationFactor;
public DistributedBloomFilter(List<BloomFilter<T>> filters, int replicationFactor) {
this.filters = filters;
this.replicationFactor = replicationFactor;
}
public void add(T element) {
// Add to multiple filters for redundancy
for (int i = 0; i < replicationFactor && i < filters.size(); i++) {
filters.get(i).put(element);
}
}
public boolean mightContain(T element) {
// Check if any filter contains the element
for (BloomFilter<T> filter : filters) {
if (filter.mightContain(element)) {
return true;
}
}
return false;
}
}
โ๏ธ 8. How Bloom Filters Work Internallyโ
Hash Function Selectionโ
The choice of hash function significantly impacts performance and distribution:
Common Hash Functionsโ
| Hash Function | Output Size | Speed | Distribution | Use Case |
|---|---|---|---|---|
| MurmurHash3 | 32/128-bit | Very Fast | Excellent | General purpose |
| xxHash | 64-bit | Very Fast | Excellent | High-performance |
| CityHash | 64/128-bit | Very Fast | Excellent | High-performance |
| FNV | 32/64-bit | Fast | Good | Simple implementations |
| Jenkins | 32-bit | Fast | Good | General purpose |
Recommendation: Use MurmurHash3 or xxHash for performance-critical applications.
Multiple Hash Functionsโ
Instead of using multiple independent hash functions, we can use a single hash function with different seeds:
public class BloomFilter<T> {
private final int[] seeds;
public BloomFilter(int size, int hashCount) {
this.size = size;
this.hashCount = hashCount;
this.seeds = new int[hashCount];
for (int i = 0; i < hashCount; i++) {
seeds[i] = i; // Use index as seed
}
}
private int[] getHashPositions(T element) {
int[] positions = new int[hashCount];
for (int i = 0; i < hashCount; i++) {
positions[i] = Math.abs(hash(element, seeds[i]) % size);
}
return positions;
}
}
Bit Array Implementationโ
Efficient Bit Arrayโ
public class BitArray {
private final long[] bits;
private final int size;
public BitArray(int size) {
this.size = size;
this.bits = new long[(size + 63) / 64]; // Use long for efficiency
}
public void set(int position) {
int arrayIndex = position / 64;
int bitIndex = position % 64;
bits[arrayIndex] |= (1L << bitIndex);
}
public boolean get(int position) {
int arrayIndex = position / 64;
int bitIndex = position % 64;
return (bits[arrayIndex] & (1L << bitIndex)) != 0;
}
}
Memory-Mapped Bit Arrayโ
For very large Bloom filters, use memory-mapped files:
public class MappedBitArray {
private final MappedByteBuffer buffer;
private final int size;
public MappedBitArray(int size, String filePath) throws IOException {
this.size = size;
RandomAccessFile file = new RandomAccessFile(filePath, "rw");
FileChannel channel = file.getChannel();
this.buffer = channel.map(
FileChannel.MapMode.READ_WRITE,
0,
(size + 7) / 8
);
}
public void set(int position) {
int byteIndex = position / 8;
int bitIndex = position % 8;
byte value = buffer.get(byteIndex);
buffer.put(byteIndex, (byte)(value | (1 << bitIndex)));
}
public boolean get(int position) {
int byteIndex = position / 8;
int bitIndex = position % 8;
byte value = buffer.get(byteIndex);
return (value & (1 << bitIndex)) != 0;
}
}
Concurrency Considerationsโ
Thread-Safe Bloom Filterโ
import java.util.concurrent.atomic.AtomicLongArray;
public class ConcurrentBloomFilter<T> {
private final AtomicLongArray bits;
private final int size;
private final int hashCount;
public ConcurrentBloomFilter(int size, int hashCount) {
this.size = size;
this.hashCount = hashCount;
this.bits = new AtomicLongArray((size + 63) / 64);
}
public void put(T element) {
for (int i = 0; i < hashCount; i++) {
int position = hash(element, i) % size;
setBit(position);
}
}
public boolean mightContain(T element) {
for (int i = 0; i < hashCount; i++) {
int position = hash(element, i) % size;
if (!getBit(position)) {
return false;
}
}
return true;
}
private void setBit(int position) {
int arrayIndex = position / 64;
int bitIndex = position % 64;
long mask = 1L << bitIndex;
long oldValue, newValue;
do {
oldValue = bits.get(arrayIndex);
newValue = oldValue | mask;
} while (!bits.compareAndSet(arrayIndex, oldValue, newValue));
}
private boolean getBit(int position) {
int arrayIndex = position / 64;
int bitIndex = position % 64;
return (bits.get(arrayIndex) & (1L << bitIndex)) != 0;
}
}
Performance Optimizationโ
SIMD Hashingโ
Use SIMD instructions for faster hash computation:
#include <immintrin.h>
void simd_hash(const uint8_t* data, size_t len, uint64_t* hashes, int hash_count) {
__m256i accumulator = _mm256_setzero_si256();
// Process 32 bytes at a time
for (size_t i = 0; i + 32 <= len; i += 32) {
__m256i chunk = _mm256_loadu_si256((__m256i*)(data + i));
accumulator = _mm256_xor_si256(accumulator, chunk);
}
// Generate multiple hash values
uint64_t result[4];
_mm256_storeu_si256((__m256i*)result, accumulator);
for (int i = 0; i < hash_count; i++) {
hashes[i] = result[i % 4] ^ (i * 0x9e3779b97f4a7c15L);
}
}
Cache-Friendly Implementationโ
Optimize for CPU cache locality:
public class CacheFriendlyBloomFilter<T> {
private final long[] bits;
private final int size;
private final int hashCount;
public CacheFriendlyBloomFilter(int size, int hashCount) {
this.size = size;
this.hashCount = hashCount;
// Ensure size is cache-line aligned
this.bits = new long[((size + 63) / 64 + 7) / 8 * 8];
}
public void put(T element) {
int[] positions = getHashPositions(element);
// Set all bits in a single pass
for (int position : positions) {
int arrayIndex = position / 64;
int bitIndex = position % 64;
bits[arrayIndex] |= (1L << bitIndex);
}
}
}
๐ 9. Integration Patternsโ
Pattern 1: Bloom Filter + Databaseโ
Use Bloom filter as a pre-check before database queries:
@Service
public class UserService {
private final BloomFilter<Long> userIdFilter;
private final UserRepository userRepository;
public User getUser(Long userId) {
// Fast pre-check
if (!userIdFilter.mightContain(userId)) {
return null; // Definitely doesn't exist
}
// Verify with database
return userRepository.findById(userId).orElse(null);
}
public void addUser(User user) {
userRepository.save(user);
userIdFilter.put(user.getId());
}
}
Pattern 2: Bloom Filter + Cacheโ
Use Bloom filter to prevent cache penetration:
@Service
public class CacheService {
private final BloomFilter<String> cacheKeys;
private final Cache cache;
private final Database database;
public Object get(String key) {
// Check if key might be in cache
if (!cacheKeys.mightContain(key)) {
return null; // Definitely not in cache
}
// Try cache
Object value = cache.get(key);
if (value != null) {
return value;
}
// Cache miss, load from database
value = database.get(key);
if (value != null) {
cache.put(key, value);
cacheKeys.put(key);
}
return value;
}
}
Pattern 3: Bloom Filter + Message Queueโ
Use Bloom filter for duplicate message detection:
@Service
public class MessageProcessor {
private final BloomFilter<String> processedMessages;
private final KafkaTemplate<String, String> kafkaTemplate;
@KafkaListener(topics = "messages")
public void processMessage(String message) {
String messageId = extractMessageId(message);
// Check if already processed
if (processedMessages.mightContain(messageId)) {
log.warn("Duplicate message: {}", messageId);
return;
}
// Process message
handleMessage(message);
// Mark as processed
processedMessages.put(messageId);
}
}
Pattern 4: Bloom Filter + Rate Limitingโ
Use Bloom filter for efficient rate limiting:
@Service
public class RateLimiter {
private final Map<String, BloomFilter<String>> userFilters;
private final int maxRequests;
private final long windowSize;
public boolean allowRequest(String userId, String requestId) {
BloomFilter<String> filter = userFilters.computeIfAbsent(
userId,
id -> createNewFilter()
);
if (filter.mightContain(requestId)) {
return false; // Already seen this request
}
filter.put(requestId);
return true;
}
private BloomFilter<String> createNewFilter() {
return BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
maxRequests,
0.01
);
}
}
Pattern 5: Bloom Filter + Searchโ
Use Bloom filter for search optimization:
@Service
public class SearchService {
private final BloomFilter<String> documentIds;
private final SearchEngine searchEngine;
public List<Document> search(String query) {
// Get candidate document IDs
List<String> candidateIds = searchEngine.getCandidateIds(query);
// Filter using Bloom filter
List<String> filteredIds = candidateIds.stream()
.filter(documentIds::mightContain)
.collect(Collectors.toList());
// Get actual documents
return searchEngine.getDocuments(filteredIds);
}
}
โ๏ธ 10. Pros and Consโ
Prosโ
โ Space-efficient: Uses 96% less memory than HashSet for large sets โ O(1) lookup: Constant time regardless of set size โ Zero false negatives: "Definitely not" is always correct โ Predictable memory: No dynamic resizing โ Fast operations: Hash computation is very fast โ No locking: Can be used in concurrent scenarios โ Simple implementation: Easy to understand and implement โ Configurable accuracy: Trade memory for accuracy
Consโ
โ False positives: Cannot be eliminated, only controlled โ No deletion: Standard Bloom filters don't support efficient removal โ Fixed size: Must know expected number of elements in advance โ No ordering: Cannot iterate over elements โ Limited operations: Only supports add and contains operations โ Hash quality dependent: Poor hash functions increase false positives โ Not suitable for small sets: Overhead not worth it for small datasets
When to Use Bloom Filtersโ
Use Bloom Filters when:
- Memory is constrained and you can tolerate false positives
- You need fast membership testing
- You can verify false positives with a secondary data structure
- The set is large and static (or slowly changing)
- False negatives are unacceptable
- You need to filter out a large number of non-matching items
Avoid Bloom Filters when:
- You need exact membership testing
- You need to support deletions frequently
- The set is very small (overhead not worth it)
- You need to iterate over elements
- False positives are unacceptable
- Memory is not a constraint
๐ 11. Interview Questionsโ
Beginner Levelโ
Q1: What is a Bloom filter? A: A Bloom filter is a space-efficient probabilistic data structure for testing whether an element is a member of a set. It can tell you definitively if an element is NOT in the set, but can only tell you PROBABLY if an element IS in the set (with a small false positive rate).
Q2: What are the main advantages of Bloom filters? A: The main advantages are: 1) Extremely space-efficient (uses 96% less memory than HashSet), 2) O(1) lookup time, 3) Zero false negatives, 4) Predictable memory usage, 5) Fast operations.
Q3: What are the main disadvantages of Bloom filters? A: The main disadvantages are: 1) False positives (cannot be eliminated), 2) No deletion support in standard implementation, 3) Fixed size (must know expected elements in advance), 4) No iteration support, 5) Limited operations (only add and contains).
Q4: How does a Bloom filter work? A: A Bloom filter uses k hash functions to map each element to k positions in a bit array. When adding an element, all k positions are set to 1. When checking membership, all k positions are checked - if any is 0, the element is definitely not in the set; if all are 1, the element is probably in the set (but could be a false positive).
Q5: What is the time complexity of Bloom filter operations? A: Both add and contains operations are O(k) where k is the number of hash functions. Since k is typically small (5-15), this is effectively O(1) constant time.
Intermediate Levelโ
Q6: How do you choose the size of a Bloom filter? A: The size depends on the expected number of elements (n) and desired false positive rate (p). The formula is: m = -(n * ln(p)) / (ln(2)ยฒ). For example, with 1 million elements and 1% false positive rate, you need approximately 1.2 MB.
Q7: How do you choose the number of hash functions? A: The optimal number of hash functions is k = (m/n) * ln(2), where m is the bit array size and n is the number of elements. For a 1.2 MB filter with 1 million elements, this gives approximately 7 hash functions.
Q8: How do you handle false positives in Bloom filters? A: False positives are handled by using a secondary data structure (like a database or HashSet) to verify the result. The Bloom filter acts as a fast pre-filter that eliminates most negative cases, reducing load on the secondary structure.
Q9: Can you delete elements from a Bloom filter? A: Standard Bloom filters don't support deletion because setting a bit back to 0 might affect other elements. However, Counting Bloom filters support deletion by using counters instead of bits, at the cost of increased memory usage.
Q10: How do Bloom filters compare to HashSets? A: Bloom filters use 96% less memory than HashSets but have a small false positive rate. HashSets provide exact membership testing but use much more memory. Bloom filters are better when memory is constrained and false positives are acceptable.
Advanced Levelโ
Q11: Design a system using Bloom filters for cache penetration prevention. A: Implement a Bloom filter to track all valid cache keys. Before checking the cache, first check the Bloom filter. If the filter says the key doesn't exist, skip the cache entirely. If it says the key might exist, check the cache. This prevents repeated cache misses for non-existent keys.
Q12: How do you implement a distributed Bloom filter? A: Implement a distributed Bloom filter by partitioning the data across multiple nodes using consistent hashing. Each node maintains its own Bloom filter for its partition. For membership testing, check the appropriate node's filter. For redundancy, replicate each partition across multiple nodes.
Q13: How do you optimize Bloom filter performance for high-throughput systems? A: Optimize by: 1) Using fast hash functions like MurmurHash3 or xxHash, 2) Using SIMD instructions for hash computation, 3) Implementing cache-friendly data structures, 4) Using lock-free concurrent implementations, 5) Pre-allocating memory to avoid dynamic resizing.
Q14: How do you handle Bloom filter overflow? A: Handle overflow by: 1) Using Scalable Bloom Filters that dynamically grow, 2) Implementing a tiered approach with multiple filters, 3) Monitoring false positive rate and rebuilding when it exceeds threshold, 4) Using a backup data structure for overflow cases.
Q15: How do you integrate Bloom filters with existing databases? A: Integrate by: 1) Creating a Bloom filter from existing data, 2) Updating the filter synchronously with database writes, 3) Using the filter as a pre-check before database queries, 4) Implementing periodic filter rebuilding to maintain accuracy, 5) Monitoring filter performance and adjusting parameters as needed.
Senior Levelโ
Q16: Design a URL deduplication system for a web crawler using Bloom filters. A: Implement a distributed Bloom filter system where each crawler node maintains its own filter. Use consistent hashing to partition URLs across nodes. Implement a scalable Bloom filter that grows as more URLs are discovered. Use a backup database for verification and periodic filter rebuilding. Implement monitoring for false positive rate and automatic adjustment.
Q17: How do you handle Bloom filter synchronization in distributed systems? A: Handle synchronization by: 1) Using gossip protocols to propagate filter updates, 2) Implementing vector clocks for conflict resolution, 3) Using CRDTs for merge operations, 4) Implementing periodic synchronization with conflict resolution, 5) Using a central coordinator for critical operations.
Q18: How do you optimize Bloom filter memory usage for very large datasets? A: Optimize by: 1) Using memory-mapped files for very large filters, 2) Implementing compression techniques, 3) Using tiered filters with different accuracy levels, 4) Implementing probabilistic data structures like HyperLogLog for cardinality estimation, 5) Using disk-based storage for cold data.
Q19: How do you debug Bloom filter issues in production? A: Debug by: 1) Monitoring false positive rate and comparing to expected, 2) Tracking hash function distribution, 3) Implementing comprehensive logging, 4) Using canary deployments for testing changes, 5) Implementing health checks and alerts, 6) Using chaos engineering to test failure scenarios.
Q20: How do you choose between Bloom filter and alternative data structures? A: Choose based on: 1) Memory constraints vs. accuracy requirements, 2) Need for deletion support, 3) Dataset size and growth rate, 4) Performance requirements, 5) Operational complexity, 6) Team expertise. Consider alternatives like Cuckoo filters, Quotient filters, or traditional data structures based on specific requirements.
๐ง 12. Senior Deep Dive: Advanced Topicsโ
Topic 1: Cuckoo Filterโ
Cuckoo filters are an alternative to Bloom filters that support deletion with better space efficiency.
public class CuckooFilter<T> {
private final Bucket[] buckets;
private final int bucketSize;
private final int fingerprintSize;
private final int maxKicks;
public CuckooFilter(int capacity, int fingerprintSize) {
this.bucketSize = 4;
this.fingerprintSize = fingerprintSize;
this.maxKicks = 500;
this.buckets = new Bucket[capacity / bucketSize];
}
public boolean insert(T item) {
byte[] fingerprint = getFingerprint(item);
int[] indices = getIndices(item);
// Try to insert in first bucket
if (buckets[indices[0]].hasSpace()) {
buckets[indices[0]].insert(fingerprint);
return true;
}
// Try to insert in second bucket
if (buckets[indices[1]].hasSpace()) {
buckets[indices[1]].insert(fingerprint);
return true;
}
// Relocate existing items
return relocate(indices[0], fingerprint);
}
public boolean contains(T item) {
byte[] fingerprint = getFingerprint(item);
int[] indices = getIndices(item);
return buckets[indices[0]].contains(fingerprint) ||
buckets[indices[1]].contains(fingerprint);
}
public boolean delete(T item) {
byte[] fingerprint = getFingerprint(item);
int[] indices = getIndices(item);
return buckets[indices[0]].delete(fingerprint) ||
buckets[indices[1]].delete(fingerprint);
}
private boolean relocate(int index, byte[] fingerprint) {
int current = index;
byte[] currentFingerprint = fingerprint;
for (int i = 0; i < maxKicks; i++) {
// Kick out random item
int kickIndex = buckets[current].getRandomIndex();
byte[] kickedFingerprint = buckets[current].get(kickIndex);
// Insert current fingerprint
buckets[current].set(kickIndex, currentFingerprint);
// Calculate alternate index for kicked fingerprint
int altIndex = getAltIndex(kickedFingerprint, current);
// Try to insert in alternate bucket
if (buckets[altIndex].hasSpace()) {
buckets[altIndex].insert(kickedFingerprint);
return true;
}
// Continue with kicked fingerprint
current = altIndex;
currentFingerprint = kickedFingerprint;
}
return false; // Table is full
}
}
Topic 2: Quotient Filterโ
Quotient filters provide better cache locality than Bloom filters.
public class QuotientFilter {
private final long[] filter;
private final int r; // Remainder bits
private final int q; // Quotient bits
public QuotientFilter(int capacity, int bitsPerElement) {
this.r = bitsPerElement;
this.q = (int)(Math.log(capacity) / Math.log(2));
this.filter = new long[1 << q];
}
public boolean insert(long hash) {
long quotient = hash >>> r;
long remainder = hash & ((1L << r) - 1);
int index = (int)quotient;
long slot = filter[index];
// Check if already present
if (isOccupied(slot) && getRemainder(slot) == remainder) {
return true;
}
// Insert
filter[index] = setOccupied(slot);
filter[index] = setRemainder(filter[index], remainder);
return true;
}
public boolean contains(long hash) {
long quotient = hash >>> r;
long remainder = hash & ((1L << r) - 1);
int index = (int)quotient;
long slot = filter[index];
return isOccupied(slot) && getRemainder(slot) == remainder;
}
private boolean isOccupied(long slot) {
return (slot & 0x8000000000000000L) != 0;
}
private long setOccupied(long slot) {
return slot | 0x8000000000000000L;
}
private long getRemainder(long slot) {
return slot & ((1L << r) - 1);
}
private long setRemainder(long slot, long remainder) {
return (slot & ~((1L << r) - 1)) | remainder;
}
}
Topic 3: Bloom Filter Compressionโ
Compress Bloom filters for storage and transmission.
public class CompressedBloomFilter {
private final BloomFilter<String> filter;
public CompressedBloomFilter(int expectedElements, double falsePositiveRate) {
this.filter = BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
expectedElements,
falsePositiveRate
);
}
public byte[] serialize() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
// Write filter metadata
dos.writeInt(filter.approximateElementCount());
dos.writeDouble(filter.expectedFpp());
// Write bit array
BitSet bitSet = getBitSet(filter);
byte[] bytes = bitSet.toByteArray();
// Compress
byte[] compressed = compress(bytes);
dos.writeInt(compressed.length);
dos.write(compressed);
dos.close();
return baos.toByteArray();
}
public void deserialize(byte[] data) throws IOException {
ByteArrayInputStream bais = new ByteArrayInputStream(data);
DataInputStream dis = new DataInputStream(bais);
// Read metadata
int elementCount = dis.readInt();
double fpp = dis.readDouble();
// Read compressed data
int compressedLength = dis.readInt();
byte[] compressed = new byte[compressedLength];
dis.readFully(compressed);
// Decompress
byte[] decompressed = decompress(compressed);
// Reconstruct filter
BitSet bitSet = BitSet.valueOf(decompressed);
reconstructFilter(bitSet, elementCount, fpp);
dis.close();
}
private byte[] compress(byte[] data) {
// Use GZIP compression
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(baos);
gzip.write(data);
gzip.close();
return baos.toByteArray();
} catch (IOException e) {
throw new RuntimeException("Compression failed", e);
}
}
private byte[] decompress(byte[] compressed) {
// Use GZIP decompression
try {
ByteArrayInputStream bais = new ByteArrayInputStream(compressed);
GZIPInputStream gzip = new GZIPInputStream(bais);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int len;
while ((len = gzip.read(buffer)) > 0) {
baos.write(buffer, 0, len);
}
gzip.close();
return baos.toByteArray();
} catch (IOException e) {
throw new RuntimeException("Decompression failed", e);
}
}
}
Topic 4: Bloom Filter Monitoringโ
Monitor Bloom filter health and performance.
@Service
public class BloomFilterMonitor {
private final BloomFilter<String> filter;
private final MetricsCollector metrics;
@Scheduled(fixedRate = 60000) // Every minute
public void monitorFilter() {
// Calculate actual false positive rate
double actualFpp = calculateActualFpp();
// Record metrics
metrics.gauge("bloom.filter.size", filter.approximateElementCount());
metrics.gauge("bloom.filter.fpp", actualFpp);
metrics.gauge("bloom.filter.expected.fpp", filter.expectedFpp());
// Alert if FPP is too high
if (actualFpp > filter.expectedFpp() * 2) {
alert("Bloom filter FPP is too high: " + actualFpp);
}
}
private double calculateActualFpp() {
// Test with known non-existent elements
int testSize = 10000;
int falsePositives = 0;
for (int i = 0; i < testSize; i++) {
String testKey = "test_" + i + "_" + UUID.randomUUID();
if (filter.mightContain(testKey)) {
falsePositives++;
}
}
return (double) falsePositives / testSize;
}
}
Topic 5: Bloom Filter Rebuildingโ
Periodically rebuild Bloom filters to maintain accuracy.
@Service
public class BloomFilterRebuilder {
private final BloomFilter<String> filter;
private final DataSource dataSource;
@Scheduled(cron = "0 0 2 * * ?") // Every day at 2 AM
public void rebuildFilter() {
log.info("Rebuilding Bloom filter");
// Get all current keys from database
List<String> allKeys = getAllKeys();
// Create new filter
BloomFilter<String> newFilter = BloomFilter.create(
Funnels.stringFunnel(StandardCharsets.UTF_8),
allKeys.size(),
0.01
);
// Populate new filter
allKeys.forEach(newFilter::put);
// Atomically replace old filter
replaceFilter(newFilter);
log.info("Bloom filter rebuilt with {} elements", allKeys.size());
}
private List<String> getAllKeys() {
// Query database for all keys
return jdbcTemplate.query(
"SELECT id FROM items",
(rs, rowNum) -> rs.getString("id")
);
}
private synchronized void replaceFilter(BloomFilter<String> newFilter) {
// Replace filter atomically
this.filter.clear();
newFilter.copyTo(this.filter);
}
}
๐ Additional Resourcesโ
Academic Papersโ
- "Space/Time Trade-offs in Hash Coding with Allowable Errors" by Burton H. Bloom (1970)
- "An Optimal Algorithm for Approximate Member Matching" by Andrei Broder and Michael Mitzenmacher
- "Cuckoo Filter: Practically Better Than Bloom" by Bin Fan et al.
Booksโ
- "Introduction to Algorithms" by Cormen et al. (Chapter on Hash Tables)
- "Algorithms" by Sedgewick and Wayne
- "Beautiful Code" by O'Reilly Media
Librariesโ
- Guava BloomFilter - Java implementation
- pybloom-live - Python implementation
- bloomfilter - Go implementation
Online Resourcesโ
- Bloom Filter Calculator - Calculate optimal parameters
- Bloom Filter Visualization - Interactive visualization
- System Design Primer - Bloom filter use cases
๐ฏ Best Practicesโ
- Choose Right Parameters: Calculate optimal size and hash count based on expected elements and desired false positive rate
- Use Quality Hash Functions: Use MurmurHash3, xxHash, or similar for good distribution
- Monitor False Positive Rate: Track actual FPP and rebuild if it exceeds threshold
- Handle False Positives: Always verify with a secondary data structure
- Plan for Growth: Use scalable filters or periodic rebuilding for growing datasets
- Consider Alternatives: Evaluate Cuckoo filters, Quotient filters for specific use cases
- Test Thoroughly: Validate filter behavior with synthetic and real data
- Document Trade-offs: Maintain architecture decision records for filter choices
- Implement Monitoring: Track filter size, FPP, and performance metrics
- Plan for Rebuilding: Schedule periodic filter rebuilding to maintain accuracy
Position Bloom filters as a memory optimization tool for permission checks, existence verification, and preventing unnecessary downstream queries. Emphasize the zero false negatives guarantee (precision: never miss real data) and acceptable low false positive rates (recall: occasionally check DB on false positive). Practice articulating space vs. accuracy tradeoffs and real-world scenarios (cache penetration, dedup, URL crawling).
- Trap: "Bloom filters support efficient deletion." Reality: Only Counting Bloom filters support deletion; standard filters don't. Trade space for this capability.
- Trap: "False positives can be completely eliminated." Reality: Trade-off with size. Reducing FP rate to 0.001% requires 3ร more bits. Know the formula: m โ -n*log(p) / (ln2)ยฒ.
- Trap: "Use Bloom filters when you need fast removal of old elements." Reality: Standard Bloom doesn't support removal. Use Counting Bloom (4ร space) or periodically rebuild with active data only.
- Trap: "Bloom filters are better than caching everywhere." Reality: They're only useful when false positive handling is acceptable (e.g., recheck DB rare cases). Not suitable for precise answers (e.g., authorization checks without fallback).