Skip to main content

NoSQL & Distributed Databases

Who this guide is for

Why NoSQL?โ€‹

The Problem with Pure SQL at Scaleโ€‹

Imagine you are building Twitter. Every user has followers, posts, likes, and hashtags. In PostgreSQL you'd end up with something like:

users โ†’ posts โ†’ likes โ†’ hashtags โ†’ post_hashtags

Now imagine 400 million daily active users. What breaks?

  1. Schema is rigid โ€” Adding a poll field to posts means an ALTER TABLE that locks 10 billion rows for hours.
  2. Joins get expensive โ€” Rendering a single timeline requires joining users, posts, follows, and likes. At scale this kills latency.
  3. Vertical scaling hits a wall โ€” You can only put so much RAM and CPU in one server. A single PostgreSQL instance tops out around 100K writes/sec under ideal conditions.
  4. Everything lives in one box โ€” One region, one failure domain.

NoSQL databases emerged to solve these specific problems. They are not "better" than SQL โ€” they make different trade-offs:

They give up...To gain...
Strict ACID guaranteesHorizontal write scalability
Rigid schemasSchema flexibility / evolution
Universal query patternsSpeed for a specific access pattern
JoinsSimpler distributed architecture

Mental model: SQL is a Swiss Army knife โ€” great for many tasks, good at all of them. NoSQL databases are specialist tools โ€” excellent at their specific job, intentionally limited elsewhere.


NoSQL Categories at a Glanceโ€‹

CategoryThink of it as...ExamplesBest For
Key-ValueDistributed HashMapRedis, DynamoDB, MemcachedSessions, caching, leaderboards
DocumentDistributed JSON file cabinetMongoDB, Couchbase, FirestoreCatalogs, user profiles, CMS
Wide-ColumnDistributed sorted map of mapsCassandra, HBase, BigtableTime-series, event logs, IoT
GraphWhiteboard diagram, queryableNeo4j, Amazon NeptuneSocial graphs, fraud detection
Time-SeriesAppend-only metrics logInfluxDB, TimescaleDBMonitoring, IoT sensors
SearchIndexed text engineElasticsearch, OpenSearchFull-text search, log analytics

Key-Value Stores (Redis)โ€‹

What is it?โ€‹

The simplest possible model: a giant distributed hash map. You store a value under a key; you retrieve it by that key. The database has no idea what's inside the value โ€” it's just bytes to store and return.

SET key value [expiry]
GET key
DEL key

Why use it?โ€‹

  • Sub-millisecond latency โ€” everything lives in RAM
  • Dead simple API โ€” no schema, no migrations
  • Horizontal scaling โ€” cluster mode shards keys across nodes

When NOT to use itโ€‹

  • You need to query by value fields (e.g., "find all users where country = VN")
  • Data is too large to fit in RAM
  • You need multi-row ACID transactions

Redis โ€” Beyond Simple Get/Setโ€‹

Redis extends the pure key-value model with rich data structures. This is what makes it uniquely useful:

# โ”€โ”€ String โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
SET visits:homepage 1042
INCR visits:homepage # atomic increment โ†’ 1043
INCRBY visits:homepage 10 # โ†’ 1053

# โ”€โ”€ Hash (object fields without fetching the whole thing) โ”€โ”€
HSET user:1001 name "Alice" email "[email protected]" country "VN"
HGET user:1001 name # โ†’ "Alice"
HMGET user:1001 name country # โ†’ ["Alice", "VN"]

# โ”€โ”€ List (queue / timeline) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
LPUSH notifications:1001 "You have a new follower"
LPUSH notifications:1001 "Your post was liked"
LRANGE notifications:1001 0 9 # latest 10 notifications

# โ”€โ”€ Set (unique membership) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
SADD post:555:likes "user:1" "user:2" "user:3"
SCARD post:555:likes # count โ†’ 3
SISMEMBER post:555:likes "user:2" # โ†’ 1 (true)

# โ”€โ”€ Sorted Set (leaderboard / ranked feed) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
ZADD leaderboard 5000 "alice"
ZADD leaderboard 8200 "bob"
ZADD leaderboard 7100 "carol"
ZREVRANGE leaderboard 0 9 WITHSCORES # top 10 with scores
ZRANK leaderboard "alice" # rank (0-indexed from bottom)
ZREVRANK leaderboard "alice" # rank from top โ†’ 2

Spring Boot + Redis (Practical)โ€‹

// application.yml
spring:
data:
redis:
host: localhost
port: 6379
timeout: 2000ms
lettuce:
pool:
max-active: 10
max-idle: 5

// --- Simple cache-aside pattern ---
@Service
@RequiredArgsConstructor
public class ProductService {

private final ProductRepository repo;
private final RedisTemplate<String, Product> redisTemplate;
private static final Duration TTL = Duration.ofMinutes(30);

public Product findById(Long id) {
String key = "product:" + id;
ValueOperations<String, Product> ops = redisTemplate.opsForValue();

Product cached = ops.get(key);
if (cached != null) return cached;

Product product = repo.findById(id)
.orElseThrow(() -> new NotFoundException("Product " + id));

ops.set(key, product, TTL);
return product;
}

public void invalidate(Long id) {
redisTemplate.delete("product:" + id);
}
}

// --- Spring Cache abstraction (simpler, less control) ---
@Service
public class UserService {

@Cacheable(value = "users", key = "#id", unless = "#result == null")
public User findById(Long id) { /* hits DB only on cache miss */ }

@CacheEvict(value = "users", key = "#user.id")
public User update(User user) { /* evicts on update */ }

@CachePut(value = "users", key = "#result.id")
public User create(User user) { /* writes through to cache */ }
}
Cache-Aside vs Write-Through

Cache-aside (lazy loading): populate cache only on read miss. Risk: cache stampede on cold start. Write-through: write to cache on every DB write. Risk: cache bloat with rarely-read data. Write-behind: write to cache first, async flush to DB. Risk: data loss on crash.


Document Databases (MongoDB)โ€‹

What is it?โ€‹

A document database stores data as self-contained documents โ€” typically JSON or a binary variant of it (BSON). Each document can have a different structure. There is no table schema to define upfront.

// No predefined schema โ€” each document can look different
{ "_id": "usr_001", "name": "Alice", "plan": "premium", "verified": true }
{ "_id": "usr_002", "name": "Bob", "plan": "free", "region": "VN", "referrer": "carol" }

Why use it?โ€‹

  1. Domain objects map naturally โ€” A Java User with a nested List<Address> becomes a single document. No ORM gymnastics.
  2. Schema evolution is cheap โ€” Adding a new field to documents doesn't require ALTER TABLE.
  3. Rich queries inside documents โ€” Unlike Redis, you can query and index on any field inside the document.
  4. Horizontal sharding โ€” MongoDB shards collections across nodes using a shard key.

Real Use Casesโ€‹

DomainWhy MongoDB fits
E-commerce product catalogEach product category has different attributes (a shirt has size/color, a laptop has RAM/CPU)
User profilesArbitrary preferences, feature flags, nested metadata vary per user
CMS / blogArticles have variable fields; comments embedded naturally
IoT device dataDevice telemetry schema varies per device type
Game player stateComplex nested state (inventory, quests, achievements)

MongoDB Data Model โ€” The Core Decisionโ€‹

The most important design decision in MongoDB is embed vs reference:

Embed โ†’ store related data inside the parent document (like JOIN already done)
Reference โ†’ store only an ID and look up separately (like a foreign key)
// โ”€โ”€ EMBEDDED: Order items inside an order document โ”€โ”€โ”€โ”€โ”€โ”€
{
"_id": "ORD-001",
"userId": "usr_001",
"status": "shipped",
"items": [
{ "productId": "prod_abc", "name": "MacBook Air", "qty": 1, "price": 1299.00 },
{ "productId": "prod_xyz", "name": "USB-C Hub", "qty": 2, "price": 29.99 }
],
"shippingAddress": {
"street": "123 Nguyen Van Linh",
"city": "Ho Chi Minh City",
"country": "VN"
},
"createdAt": "2024-11-01T08:30:00Z"
}

// โ”€โ”€ REFERENCED: Blog post references author by ID โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
{
"_id": "post_001",
"title": "Understanding NoSQL",
"authorId": "usr_001", // โ† just an ID, not embedded
"tags": ["database", "nosql"],
"content": "..."
}

Decision rules:

Embed when...Reference when...
Data is always read togetherData is accessed independently
1-to-few relationship (< ~20 items)1-to-many (unbounded list)
Child has no life beyond parentChild is updated frequently on its own
You want single-document atomicityMany parents share the same child
The Unbounded Array Anti-Pattern

Never embed a list that can grow without limit. A comments array inside a blog post document will eventually hit MongoDB's 16MB document limit or cause massive read amplification.

Fix: create a separate comments collection and reference postId.

MongoDB CRUD & Aggregationโ€‹

// โ”€โ”€ Insert โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
db.products.insertOne({
name: "MacBook Air M3",
price: 1299,
category: "laptop",
specs: { ram: 16, storage: 512, chip: "M3" },
tags: ["apple", "ultrabook"]
})

// โ”€โ”€ Find with projection โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
db.products.find(
{ category: "laptop", price: { $lte: 1500 } },
{ name: 1, price: 1, _id: 0 } // project: include name & price, exclude _id
).sort({ price: 1 }).limit(10)

// โ”€โ”€ Update โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
db.products.updateOne(
{ _id: ObjectId("64abc...") },
{
$set: { price: 1199 }, // set specific fields
$push: { tags: "sale" }, // append to array
$inc: { "stats.viewCount": 1 } // atomic increment nested field
}
)

// โ”€โ”€ Aggregation Pipeline โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
// "What is the average order value per customer, for orders shipped in 2024?"
db.orders.aggregate([
{ $match: {
status: "shipped",
createdAt: { $gte: ISODate("2024-01-01"), $lt: ISODate("2025-01-01") }
}},
{ $group: {
_id: "$userId",
avgOrderValue: { $avg: "$total" },
orderCount: { $sum: 1 }
}},
{ $lookup: { // JOIN users collection
from: "users",
localField: "_id",
foreignField: "_id",
as: "user"
}},
{ $unwind: "$user" },
{ $project: {
customerName: "$user.name",
avgOrderValue: { $round: ["$avgOrderValue", 2] },
orderCount: 1
}},
{ $sort: { avgOrderValue: -1 } },
{ $limit: 20 }
])

Spring Data MongoDB โ€” Full Exampleโ€‹

// โ”€โ”€ Domain model โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
@Document(collection = "products")
@CompoundIndex(name = "cat_price_idx", def = "{'category': 1, 'price': 1}")
public class Product {

@Id
private String id;

@Indexed
private String category;

private String name;
private BigDecimal price;
private ProductSpecs specs; // embedded document
private List<String> tags;

@CreatedDate
private Instant createdAt;

@LastModifiedDate
private Instant updatedAt;
}

@Value // Lombok immutable
public class ProductSpecs {
int ram;
int storage;
String chip;
}

// โ”€โ”€ Repository โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
@Repository
public interface ProductRepository extends MongoRepository<Product, String> {

// Derived query โ€” Spring generates the query from method name
List<Product> findByCategoryAndPriceLessThanEqual(String category, BigDecimal maxPrice);

// Custom query with @Query
@Query("{ 'tags': { $in: ?0 }, 'price': { $lte: ?1 } }")
List<Product> findByTagsInAndPriceAtMost(List<String> tags, BigDecimal maxPrice);

// Pagination
Page<Product> findByCategory(String category, Pageable pageable);

// Exists / count
boolean existsByName(String name);
long countByCategory(String category);
}

// โ”€โ”€ Complex queries with MongoTemplate โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
@Service
@RequiredArgsConstructor
public class ProductQueryService {

private final MongoTemplate mongo;

public List<CategorySummary> getTopCategoriesByRevenue(int topN) {
Aggregation agg = newAggregation(
match(where("status").is("sold")),
group("category")
.sum("price").as("totalRevenue")
.count().as("unitsSold"),
sort(Sort.by(DESC, "totalRevenue")),
limit(topN),
project("totalRevenue", "unitsSold")
.and("_id").as("category")
);

return mongo.aggregate(agg, "products", CategorySummary.class)
.getMappedResults();
}

public void incrementViewCount(String productId) {
Query query = query(where("id").is(productId));
Update update = new Update().inc("stats.viewCount", 1);
mongo.updateFirst(query, update, Product.class);
}

// Bulk write โ€” efficient for batch updates
public BulkWriteResult applyDiscount(String category, double discountPct) {
BulkOperations ops = mongo.bulkOps(BulkMode.UNORDERED, Product.class);

Query q = query(where("category").is(category));
Update u = new Update().multiply("price", 1 - discountPct / 100);
ops.updateMulti(q, u);

return ops.execute();
}
}

Indexing Strategyโ€‹

// โ”€โ”€ Single field index โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
@Indexed(unique = true)
private String email;

// โ”€โ”€ Compound index (order matters!) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
// Supports: category, category+price, category+price+status
// Does NOT support: price alone, status alone
@CompoundIndex(def = "{'category': 1, 'price': 1, 'status': 1}")

// โ”€โ”€ Text index for full-text search โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
@TextIndexed(weight = 2) // higher weight = more relevance
private String name;

@TextIndexed
private String description;

// Query: db.products.find({ $text: { $search: "macbook air" } })

// โ”€โ”€ TTL index โ€” auto-delete expired documents โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
@Indexed(expireAfterSeconds = 3600)
private Instant expiresAt; // document deleted after this time + 1hr

// โ”€โ”€ Wildcard index โ€” for dynamic fields โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
// Useful for product specs that differ per category
// db.products.createIndex({ "specs.$**": 1 })
Index Design Rules
  1. Index fields you filter or sort on frequently
  2. Compound index field order: equality fields first, then range/sort fields
  3. Don't over-index: each index costs write performance and RAM
  4. Use explain("executionStats") to verify index usage before going to production

Wide-Column Stores (Cassandra)โ€‹

What is it?โ€‹

Think of Cassandra as a distributed, sorted map of maps:

Table = Map<PartitionKey, SortedMap<ClusteringKey, Row>>

Data is spread across nodes by hashing the partition key. Within a partition, rows are physically sorted by the clustering key.

Why use it?โ€‹

  • Massive write throughput โ€” Cassandra can sustain millions of writes/second across a cluster
  • Linear horizontal scaling โ€” add nodes, get proportionally more capacity
  • No single point of failure โ€” truly masterless, every node is equal
  • Tunable consistency โ€” trade consistency for speed per operation

When to use Cassandraโ€‹

โœ… Great fitโŒ Poor fit
Time-series data (IoT, metrics, events)Ad-hoc queries with unknown access patterns
Append-heavy workloadsFrequent updates to existing rows
Multi-region active-active writesComplex transactions across many rows
Known, fixed access patternsAggregations like SUM/GROUP BY at scale

Data Model โ€” Think in Queries Firstโ€‹

In SQL you normalize first and query later. In Cassandra, start with your queries and design tables to serve exactly those queries.

-- Query 1: "Get all orders for user X, newest first"
CREATE TABLE orders_by_user (
user_id UUID,
created_at TIMESTAMP,
order_id UUID,
total DECIMAL,
status TEXT,
items LIST<FROZEN<order_item>>,
PRIMARY KEY ((user_id), created_at, order_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- Query 2: "Get all orders with status=shipped (for ops dashboard)"
-- Needs a SEPARATE table โ€” Cassandra doesn't filter by non-primary-key columns efficiently
CREATE TABLE orders_by_status (
status TEXT,
created_at TIMESTAMP,
order_id UUID,
user_id UUID,
total DECIMAL,
PRIMARY KEY ((status), created_at, order_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- Efficient: hits exactly one partition
SELECT * FROM orders_by_user
WHERE user_id = ? AND created_at >= ? AND created_at <= ?;

Consistency Levelsโ€‹

Replication factor N = 3 (data exists on 3 nodes)

WRITE consistency levels:
ANY โ†’ at least 1 node (including hinted handoff) acked โ€” fastest, weakest
ONE โ†’ at least 1 replica acked
QUORUM โ†’ majority (โŒŠN/2โŒ‹ + 1 = 2) acked โ†’ strong with QUORUM reads
ALL โ†’ all 3 replicas acked โ€” strongest, slowest, unavailable if any node down

READ consistency levels:
ONE โ†’ 1 replica responds (may be stale)
QUORUM โ†’ majority responds, coordinator picks freshest value
ALL โ†’ all replicas respond

Strong consistency: W + R > N
QUORUM + QUORUM = 2 + 2 > 3 โœ…
ONE + ONE = 1 + 1 > 3 โŒ (stale reads possible)

Graph Databases (Neo4j)โ€‹

What is it?โ€‹

A graph database stores data as nodes (things) and edges (relationships between things), both with arbitrary properties.

(Alice:User)-[:FOLLOWS {since: "2023-01"}]->(Bob:User)
(Bob:User)-[:AUTHORED]->(Post:Post {title: "NoSQL Guide"})
(Alice:User)-[:LIKED]->(Post)

Why use it?โ€‹

In SQL, "find all friends of friends" requires a self-join โ€” and "friends 3 hops away" means 3 self-joins, which explodes exponentially. In a graph DB, each hop is O(1) โ€” just follow an edge pointer.

Use Casesโ€‹

  • Social networks โ€” friends, followers, mutual connections
  • Fraud detection โ€” find accounts connected to known fraud accounts within N hops
  • Recommendation engines โ€” "users who bought X also bought Y"
  • Knowledge graphs โ€” semantic relationships between concepts
  • Network topology โ€” which servers depend on which services
// Friends of friends I don't already follow
MATCH (me:User {id: $userId})-[:FOLLOWS]->(:User)-[:FOLLOWS]->(fof:User)
WHERE NOT (me)-[:FOLLOWS]->(fof) AND me <> fof
RETURN fof.name, COUNT(*) AS mutualFriends
ORDER BY mutualFriends DESC LIMIT 10;

// Fraud ring detection โ€” accounts connected within 3 hops to a known fraudster
MATCH path = (suspect:Account {flagged: true})-[:TRANSFERRED_TO|SHARES_DEVICE*1..3]->(acc:Account)
WHERE acc.flagged = false
RETURN acc, length(path) AS hops
ORDER BY hops;

// Shortest path between two users
MATCH p = shortestPath(
(alice:User {name: "Alice"})-[*]-(bob:User {name: "Bob"})
)
RETURN p, length(p) AS degrees;

DynamoDB (AWS)โ€‹

What is it?โ€‹

DynamoDB is AWS's fully managed, serverless key-value and document database. You pay per read/write unit, not per server.

Primary key is either:

  • Partition Key only (simple primary key) โ€” like userId
  • Partition Key + Sort Key (composite primary key) โ€” like userId + orderId

Single-Table Designโ€‹

The most powerful (and mind-bending) DynamoDB pattern. Store multiple entity types in one table using generic PK/SK attributes:

PK | SK | GSI1PK | Attributes
------------------|------------------|-----------------|------------------
USER#alice | PROFILE | STATUS#active | name, email, plan
USER#alice | ORDER#2024-001 | ORDER#2024-001 | total, status
USER#alice | ORDER#2024-002 | ORDER#2024-002 | total, status
PRODUCT#mac-air | DETAILS | CAT#laptop | price, stock
PRODUCT#mac-air | REVIEW#rev-001 | | rating, comment

Access patterns all served from one table:

  • Get user profile โ†’ PK = USER#alice, SK = PROFILE
  • Get all orders for user โ†’ PK = USER#alice, SK begins_with ORDER#
  • Get all orders (global) โ†’ query GSI1 where GSI1PK = ORDER#...

Spring + DynamoDB (Enhanced Client)โ€‹

@DynamoDbBean
public class Order {

private String pk; // USER#<userId>
private String sk; // ORDER#<orderId>
private BigDecimal total;
private String status;
private Instant createdAt;

@DynamoDbPartitionKey
public String getPk() { return pk; }

@DynamoDbSortKey
public String getSk() { return sk; }
}

@Service
@RequiredArgsConstructor
public class OrderDynamoService {

private final DynamoDbEnhancedClient dynamo;
private final DynamoDbTable<Order> table;

public List<Order> getOrdersForUser(String userId) {
QueryConditional condition = QueryConditional
.sortBeginsWith(Key.builder()
.partitionValue("USER#" + userId)
.sortValue("ORDER#")
.build());

return table.query(condition)
.items()
.stream()
.toList();
}
}

BASE vs ACIDโ€‹

Most NoSQL systems trade strict relational guarantees for scale and horizontal availability by following the BASE model instead of the classic ACID model:

ACID BASE
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Atomic โ€” all or nothing Basically Available
Consistent โ€” always valid state Soft State โ€” may be stale
Isolated โ€” transactions Eventually Consistent
Durable โ€” survives crashes

For a comprehensive comparison of how these properties differ under concurrency, failure modes, and architectural designs, see the Database ACID Properties guide.

Eventually consistent means: if no new writes come in, all replicas will eventually converge to the same value. But there's a window where reads may return stale data.

Timeline:
t=0 Writer updates X=5 on Node A
t=1 Reader queries Node B โ†’ returns X=3 โ† stale!
t=2 Replication propagates to Node B
t=3 Reader queries Node B โ†’ returns X=5 โ† consistent

Application strategies to handle eventual consistency:

// 1. Read-your-writes: always read from same node you wrote to
// DynamoDB: use strongly consistent reads immediately after writes
GetItemRequest req = GetItemRequest.builder()
.tableName("orders")
.key(key)
.consistentRead(true) // โ† forces consistent read, doubles RCU cost
.build();

// 2. Optimistic concurrency: use version numbers to detect stale writes
@Document
public class BankAccount {
@Id private String id;
private BigDecimal balance;

@Version // Spring Data auto-increments, throws on conflict
private Long version;
}

// 3. UI messaging: set expectations with the user
// "Your changes are saved. They may take a moment to appear everywhere."

๐Ÿ”ฌ Deep Dive: MongoDB Internalsโ€‹

For senior engineers who need to reason about performance and correctness at depth.

Storage Engine: WiredTigerโ€‹

MongoDB uses WiredTiger by default. Key characteristics:

  • B-Tree for indexes โ€” every index is a separate B-Tree; _id index always exists
  • Document-level concurrency โ€” multiple writes to different documents in the same collection don't block each other
  • MVCC (Multi-Version Concurrency Control) โ€” readers don't block writers; each reader sees a consistent snapshot
  • Compression โ€” documents are snappy/zstd compressed on disk by default
Write path:
1. Write to in-memory cache (WiredTiger cache, default 50% of RAM)
2. Write to journal (WAL file) for durability โ€” synced every 100ms by default
3. Async checkpoint to data files every 60 seconds

Read path:
1. Check WiredTiger cache
2. If miss: read from data files into cache
3. If cache full: evict least-recently-used pages (can cause latency spikes!)
WiredTiger Cache Pressure

If your working set (hot data) exceeds the WiredTiger cache, you'll see latency spikes as MongoDB evicts pages and reads from disk. Monitor serverStatus.wiredTiger.cache["pages read into cache"] โ€” a high rate signals cache thrashing.

Fix: increase storage.wiredTiger.engineConfig.cacheSizeGB, upgrade RAM, or archive cold data.

Replication: Replica Setsโ€‹

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Replica Set โ”‚
โ”‚ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” Replication โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Primary โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚Secondaryโ”‚ โ”‚
โ”‚ โ”‚ (writes)โ”‚ oplog โ”‚(reads) โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”‚ oplog โ”‚
โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚Arbiter โ”‚ โ”‚
โ”‚ โ”‚Secondaryโ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚(votes) โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  • oplog (operations log): a capped collection on the primary recording every write as an idempotent operation
  • Replication lag: secondaries apply oplog entries asynchronously โ€” lag can be seconds to minutes under heavy load
  • Automatic failover: if primary is unreachable for electionTimeoutMillis (default 10s), secondaries elect a new primary
  • Write concern: controls how many nodes must acknowledge a write
// Write concern โ€” controls durability guarantee
MongoClientSettings settings = MongoClientSettings.builder()
.writeConcern(WriteConcern.MAJORITY) // waits for majority ack before returning
// vs WriteConcern.W1 โ€” ack from primary only (faster, less safe)
// vs WriteConcern.UNACKNOWLEDGED โ€” fire-and-forget (dangerous)
.readPreference(ReadPreference.secondaryPreferred()) // read from secondaries when possible
.readConcern(ReadConcern.MAJORITY) // only return data acked by majority
.build();

Shardingโ€‹

Sharding splits a collection across multiple mongod instances (shards) using a shard key:

mongos (query router)
/ | \
Shard A Shard B Shard C
[a - h] [i - p] [q - z] โ† hashed ranges
// Choosing a shard key โ€” critical, cannot be changed without resharding

// โŒ Bad: low cardinality โ€” all docs end up on one shard ("hotspot")
{ shardKey: "status" } // only 3 values: active/inactive/banned

// โŒ Bad: monotonically increasing โ€” all writes go to the last shard
{ shardKey: "_id" } // ObjectId is time-ordered

// โœ… Good: high cardinality + even distribution
{ shardKey: "userId" } // millions of users, writes spread evenly

// โœ… Good for time-series: compound shard key
{ shardKey: { "deviceId": 1, "timestamp": 1 } }
// deviceId spreads writes; timestamp keeps time-range queries on one shard

Transactions (ACID across documents)โ€‹

MongoDB 4.0+ supports multi-document ACID transactions on replica sets:

@Service
@RequiredArgsConstructor
public class TransferService {

private final MongoClient client;
private final MongoTemplate mongo;

public void transfer(String fromId, String toId, BigDecimal amount) {
ClientSession session = client.startSession();
session.startTransaction(TransactionOptions.builder()
.writeConcern(WriteConcern.MAJORITY)
.readConcern(ReadConcern.SNAPSHOT)
.build());
try {
Query fromQuery = query(where("id").is(fromId));
Query toQuery = query(where("id").is(toId));

// Both operations inside one ACID transaction
mongo.updateFirst(fromQuery,
new Update().inc("balance", amount.negate()), Account.class);
mongo.updateFirst(toQuery,
new Update().inc("balance", amount), Account.class);

session.commitTransaction();
} catch (Exception e) {
session.abortTransaction();
throw e;
} finally {
session.close();
}
}
}
Transactions Are Expensive in MongoDB

Use transactions sparingly. MongoDB is optimized for single-document operations (which are always atomic). Multi-document transactions:

  • Add latency (distributed locking)
  • Abort if they run longer than 60 seconds
  • Hurt throughput at scale

Prefer embedding related data so one write covers everything atomically.

Change Streams โ€” React to Data Changesโ€‹

// Listen to all inserts/updates in the orders collection
@Component
@RequiredArgsConstructor
public class OrderChangeListener implements CommandLineRunner {

private final MongoClient client;

@Override
public void run(String... args) {
MongoCollection<Document> orders =
client.getDatabase("shop").getCollection("orders");

List<Bson> pipeline = List.of(
Aggregates.match(
Filters.in("operationType", List.of("insert", "update"))
)
);

// Non-blocking: use reactive MongoDB driver in production
orders.watch(pipeline).forEach(event -> {
String type = event.getOperationType().getValue();
Document full = event.getFullDocument();
log.info("Order {} was {}", full.get("_id"), type);
// Trigger downstream: notify warehouse, update analytics, etc.
});
}
}

โšก Advanced Patternsโ€‹

Outbox Pattern (MongoDB + Kafka)โ€‹

Guarantee that a DB write and a message publication are atomic โ€” even if Kafka is down.

Deep Dive: Outbox Pattern

The standard way to avoid the dual-write problem is the Transactional Outbox Pattern. For a complete guide with code examples, polling vs CDC (Debezium) trade-offs, and failure mitigation, see the dedicated Transactional Outbox Pattern Guide.

Materialized View with Aggregation Pipeline on a Scheduleโ€‹

// Pre-compute expensive analytics into a summary collection
// Run nightly via @Scheduled or a cron job

public void refreshDailySalesSummary() {
Aggregation agg = newAggregation(
match(where("status").is("completed")
.and("createdAt").gte(startOfDay())),
group("category")
.sum("total").as("revenue")
.count().as("orderCount")
.avg("total").as("avgOrderValue"),
project("revenue", "orderCount", "avgOrderValue")
.and("_id").as("category")
.and(DateOperators.dateOf("$$NOW")).as("computedAt")
);

List<DailySummary> results = mongo
.aggregate(agg, "orders", DailySummary.class)
.getMappedResults();

// Replace entire summary collection atomically
mongo.dropCollection("daily_sales_summary");
mongo.insertAll(results);
}

Bucket Pattern (for Time-Series in MongoDB)โ€‹

Instead of one document per event (millions of tiny docs), group events into buckets:

// โŒ One document per sensor reading โ€” poor performance
{ "sensorId": "s1", "timestamp": "2024-11-01T00:00:01Z", "value": 22.3 }
{ "sensorId": "s1", "timestamp": "2024-11-01T00:00:02Z", "value": 22.4 }
// ... 86,400 docs/day/sensor

// โœ… Bucket: one document per sensor per hour
{
"sensorId": "s1",
"hour": "2024-11-01T00:00:00Z",
"count": 3600,
"min": 21.8, "max": 23.1, "sum": 80640.0,
"readings": [22.3, 22.4, 22.4, 22.5, ...] // array of 3600 values
}

Benefits: 3600ร— fewer documents, index is 3600ร— smaller, range queries scan far fewer docs.


โš ๏ธ Failure Modes and Operational Concernsโ€‹

MongoDBโ€‹

ProblemSymptomFix
Working set > RAMDisk reads spike, p99 latency balloonsAdd RAM or archive cold data
Index not usedCOLLSCAN in explain() outputAdd index or rewrite query
Unbounded arrayDocument approaching 16MB limitSplit into separate collection
Replication lagSecondaries fall behindReduce write load, add secondaries, investigate slow oplog appliers
Chunk imbalanceOne shard handles 80% of trafficChoose a better shard key, manual chunk migration
Too many collectionsSlow startup, RAM pressureConsolidate; each collection has a min WiredTiger overhead

Redisโ€‹

ProblemSymptomFix
Cache stampedeDB overwhelmed after cache expiryUse probabilistic early expiration or mutex lock
Memory evictionKeys silently deletedSet maxmemory-policy intentionally, monitor evicted_keys
Hot keyOne key receives all trafficUse key sharding (key:shard:{0..N})
Blocking commandsKEYS * or SORT blocks entire RedisUse SCAN instead of KEYS; never SORT large sets in prod

Cassandraโ€‹

ProblemSymptomFix
Hot partitionOne node at 100% CPUBetter partition key distribution
Tombstone accumulationRead latency grows over timeTune gc_grace_seconds, run compaction
Partition too largeReads slow, OOMKiller on nodesSplit partition (e.g., by time bucket)
Allow filteringALLOW FILTERING in queryCreate a dedicated table for that query pattern

Choosing the Right Database โ€” Decision Treeโ€‹

Start here: What is the primary access pattern?
โ”‚
โ”œโ”€ Cache a result / session / leaderboard?
โ”‚ โ””โ”€ Redis
โ”‚
โ”œโ”€ Full-text search / log analysis?
โ”‚ โ””โ”€ Elasticsearch / OpenSearch
โ”‚
โ”œโ”€ Graph traversal (friends-of-friends, fraud rings)?
โ”‚ โ””โ”€ Neo4j / Amazon Neptune
โ”‚
โ”œโ”€ Time-series (IoT, metrics, monitoring)?
โ”‚ โ””โ”€ InfluxDB / TimescaleDB / Cassandra
โ”‚
โ”œโ”€ Massive append-only writes + known query patterns?
โ”‚ โ””โ”€ Cassandra
โ”‚
โ”œโ”€ Complex relational queries + ACID transactions?
โ”‚ โ””โ”€ PostgreSQL / MySQL
โ”‚
โ”œโ”€ Flexible schema + rich document queries + moderate scale?
โ”‚ โ””โ”€ MongoDB
โ”‚
โ”œโ”€ AWS serverless + auto-scaling + single-digit ms latency?
โ”‚ โ””โ”€ DynamoDB
โ”‚
โ””โ”€ Multiple concerns in one system?
โ””โ”€ Probably MongoDB + Redis (cache layer) is 80% of use cases

๐ŸŽฏ Interview Questionsโ€‹

For New Learnersโ€‹

Q: What is NoSQL and why does it exist?

NoSQL databases emerged to solve problems that relational databases struggle with at scale: rigid schemas that are costly to change, the inability to scale horizontally across many servers, and poor fit for data shapes like documents, graphs, or time-series. NoSQL trades some ACID guarantees (like strict consistency) for flexibility, scalability, and specialized performance characteristics.

Q: What is the difference between a key-value store and a document store?

A key-value store treats values as opaque bytes โ€” you can only get/set by key, not query inside the value. A document store (like MongoDB) understands the structure of values (JSON) and lets you query, index, and update individual fields within documents.

Q: What is eventual consistency?

In an eventually consistent system, when you write to one node, other nodes may briefly return the old value until replication completes. The guarantee is that all nodes will eventually agree, but there's a window where reads may be stale.

For Senior Engineersโ€‹

Q: How do you decide between embedding and referencing in MongoDB?

Embed when data is always read together, the relationship is 1-to-few, and the child has no independent lifecycle. Reference when the list is unbounded, the child is updated frequently on its own, or many parents share the same child. A key signal: if embedding would cause a document to grow indefinitely (comments, events, logs), use referencing.

Q: A MongoDB query that was fast last month is now slow. Walk me through your diagnosis.

First, run explain("executionStats") and look at: stage (COLLSCAN = no index used), totalDocsExamined vs totalDocsReturned (large ratio = poor selectivity), and executionTimeMillis. Check if an index exists with getIndexes(). If the index exists but isn't used, check field order in compound indexes (ESR rule: equality โ†’ sort โ†’ range). Check for collection scan due to type mismatch (string vs int in query). Finally, check if the WiredTiger cache is under pressure โ€” use db.serverStatus().wiredTiger.cache.

Q: How does Cassandra achieve high write throughput?

Cassandra writes are append-only to an in-memory structure (MemTable) plus a commit log (WAL). No random disk seeks. When MemTable is full, it's flushed as an immutable SSTable to disk. Reads must merge across SSTables (plus Bloom filters to skip irrelevant files). This trade-off makes writes fast at the cost of more complex reads and compaction overhead.

Q: When would you use MongoDB transactions, and what are the trade-offs?

Use multi-document transactions when atomicity across multiple documents/collections is a hard business requirement (e.g., transferring funds between accounts). The trade-offs: transactions acquire locks, increasing latency and reducing throughput; they abort after 60 seconds; and they add complexity. The preferred approach is to design documents so a single write covers all related data atomically (embedding), reserving transactions for cases where that's truly not possible.


Summary Cheat Sheetโ€‹

DatabaseModelConsistencyHorizontal ScaleJava/Spring Integration
MongoDBDocumentTunable (majority/local)โœ… Shardingspring-boot-starter-data-mongodb
RedisKey-Value + StructuresSingle-node: strong; Cluster: eventualโœ… Cluster/Sentinelspring-boot-starter-data-redis
CassandraWide-ColumnTunable (ONE/QUORUM/ALL)โœ… Linearspring-boot-starter-data-cassandra
Neo4jGraphStrong (single instance)Limitedspring-boot-starter-data-neo4j
DynamoDBKV + DocumentEventual / Strong (cost)โœ… ServerlessAWS SDK v2 + aws-sdk-java-v2

Further Readingโ€‹