Data Consistency & Transactions
Ensuring data remains correct and coherent across distributed systems is one of the hardest challenges in system design.
Table of Contentsโ
- Beginner View
- Distributed Consistency Patterns
- Conflict Resolution (Multi-Master)
- Transactional Outbox Pattern
- Quorum Reads and Read Repair
- Idempotency Patterns
- Locking Patterns: Local to Distributed
- How Transactions Work Internally
- Distributed Transactions & Workflows
- Consensus Algorithms
- Event Sourcing and CQRS
- Change Data Capture (CDC)
- Real-World Implementations
- Application Transaction Integration Patterns
- Pros and Cons
- Interview Questions
- Senior Deep Dive: Advanced Topics
- Additional Resources
- Best Practices
Beginner Viewโ
ACID Propertiesโ
| Property | Meaning | Example |
|---|---|---|
| Atomicity | All or nothing โ no partial writes | Transfer: debit + credit, both succeed or both fail |
| Consistency | DB moves from one valid state to another | Balance never goes negative (constraint enforced) |
| Isolation | Concurrent transactions don't interfere | Two transfers don't corrupt each other |
| Durability | Committed data survives crashes | Power loss doesn't lose committed transactions |
Spring Transaction Managementโ
@Transactional
public void transferMoney(Long fromId, Long toId, BigDecimal amount) {
Account from = accountRepository.findById(fromId).orElseThrow();
Account to = accountRepository.findById(toId).orElseThrow();
from.debit(amount); // Validates: throws if insufficient funds
to.credit(amount);
accountRepository.save(from);
accountRepository.save(to);
// If exception: entire transaction rolls back (atomicity)
}
// Propagation options
@Transactional(propagation = Propagation.REQUIRED) // Join existing or create new (default)
@Transactional(propagation = Propagation.REQUIRES_NEW) // Always new transaction
@Transactional(propagation = Propagation.NESTED) // Savepoint within existing
@Transactional(readOnly = true) // Read-only optimization
@Transactional(timeout = 5) // 5 second timeout
@Transactional(rollbackFor = BusinessException.class) // Rollback on checked exceptions too
BASE Properties (NoSQL)โ
| Property | Meaning |
|---|---|
| Basically Available | System available most of the time |
| Soft state | Data may be in transition |
| Eventual consistency | Will converge to consistent state |
Consistency Anomaliesโ
Dirty Readโ
T1: UPDATE balance = 500 (not yet committed)
T2: READ balance = 500 (reads uncommitted data)
T1: ROLLBACK
T2: Used wrong data!
Fixed by: READ COMMITTED isolation level.
Non-Repeatable Readโ
T1: READ balance = 1000
T2: UPDATE balance = 500 (commits)
T1: READ balance = 500 (different value in same transaction!)
Fixed by: REPEATABLE READ.
Phantom Readโ
T1: SELECT COUNT(*) WHERE amount > 100 โ 5 rows
T2: INSERT new row WHERE amount = 200 (commits)
T1: SELECT COUNT(*) WHERE amount > 100 โ 6 rows (phantom!)
Fixed by: SERIALIZABLE.
Lost Updateโ
T1: READ balance = 1000
T2: READ balance = 1000
T1: UPDATE balance = 1000 + 100 = 1100 (commits)
T2: UPDATE balance = 1000 + 50 = 1050 (overwrites T1!)
Final: 1050 instead of 1150
Fixed by: Pessimistic lock, optimistic lock, or UPDATE ... SET balance = balance + 50.
Write Skewโ
Two transactions read the same data, make decisions based on it, then write different records.
Constraint: At least one doctor must be on call.
T1: Reads: Alice on_call=true, Bob on_call=true โ Alice can go off-call
T2: Reads: Alice on_call=true, Bob on_call=true โ Bob can go off-call
T1: UPDATE Alice SET on_call=false
T2: UPDATE Bob SET on_call=false
Result: Nobody on call! Constraint violated.
Fix: SERIALIZABLE isolation or explicit SELECT FOR UPDATE on the check.
Distributed Consistency Patternsโ
Eventual Consistencyโ
Write โ Primary DB โ Propagate to replicas (async)
Read from replica โ might get stale data
Acceptable for: Social feed, product views, analytics
Not acceptable for: Bank balance, inventory count
Read-Your-Writes Consistencyโ
// After write, route subsequent reads to primary for the session
public User updateAndReturn(Long userId, UpdateRequest req) {
User user = repo.save(mapper.toEntity(req));
// Signal: next read for this user must hit primary
sessionStore.set("primary_read:" + userId, "1", Duration.ofSeconds(5));
return user;
}
public User findUser(Long userId) {
boolean mustReadPrimary = sessionStore.exists("primary_read:" + userId);
if (mustReadPrimary) {
return primaryRepo.findById(userId);
}
return replicaRepo.findById(userId);
}
Causal Consistencyโ
Operations causally related are seen in order.
User posts comment โ Sees own comment (read-your-writes)
User B replies โ Sees original comment + reply (causal order preserved)
Conflict Resolution (Multi-Master)โ
Last-Write-Wins (LWW)โ
T1 writes value=100 at timestamp=1000
T2 writes value=200 at timestamp=1001
Winner: T2 (higher timestamp)
Problem: Clock skew โ timestamps can't be trusted across nodes
CRDT (Conflict-free Replicated Data Types)โ
Data structures that merge without conflicts.
// G-Counter (grow-only) โ each node tracks its own count
Map<String, Long> nodeCounters = {
"node1": 5,
"node2": 3,
"node3": 7
}
// Total = sum of all = 15
// Merge: take max per node
Application-Level Resolutionโ
// User profile merge: newest non-null field wins
public UserProfile merge(UserProfile local, UserProfile remote) {
return UserProfile.builder()
.name(newerNonNull(local.getName(), local.getNameTs(),
remote.getName(), remote.getNameTs()))
.email(newerNonNull(local.getEmail(), local.getEmailTs(),
remote.getEmail(), remote.getEmailTs()))
.build();
}
Transactional Outbox Pattern (Solving the Dual-Write Problem)โ
To prevent inconsistency caused by the dual-write problem (writing to a database and publishing an event to a message broker sequentially), use the Transactional Outbox Pattern.
This pattern guarantees at-least-once event publishing by saving the event payload to an outbox_events database table within the same local transaction as the business operation, and subsequently exporting it via a polling publisher or Change Data Capture (CDC).
For a complete guide, PostgreSQL schemas, Spring Boot implementation code, relay strategies (Polling with SKIP LOCKED vs. Debezium CDC), and production checklists, see the dedicated Transactional Outbox Pattern Guide.
Quorum Reads and Read Repairโ
Beginner Viewโ
In leaderless replication, quorum is configured with N replicas:
Wreplicas must acknowledge a writeRreplicas are queried for a read
If W + R > N, reads are more likely to see latest writes.
Senior Deep Diveโ
Example with N=3:
- Stronger freshness:
W=2, R=2 - Higher availability:
W=1, R=1
Read repair strategy:
- Read from multiple replicas
- Compare versions/vector clocks
- Return latest to client
- Asynchronously repair stale replicas
Tradeoffsโ
- Higher
Rincreases read latency but reduces stale reads - Lower
Wimproves write availability but increases reconciliation work - Hot partitions can trigger repair storms under read-heavy load
Idempotency Patternsโ
Database Constraintโ
-- Natural idempotency via UNIQUE constraint
CREATE TABLE processed_payments (
idempotency_key VARCHAR(100) PRIMARY KEY,
payment_id BIGINT NOT NULL,
result JSONB NOT NULL,
processed_at TIMESTAMPTZ NOT NULL
);
-- On duplicate: INSERT ... ON CONFLICT DO NOTHING
Application-Levelโ
public PaymentResult processPayment(PaymentRequest req) {
return processedRepo.findByKey(req.getIdempotencyKey())
.map(p -> p.getResult()) // Return cached result
.orElseGet(() -> {
PaymentResult result = doProcess(req);
processedRepo.save(new ProcessedPayment(req.getIdempotencyKey(), result));
return result;
});
}
Locking Patterns: Local to Distributedโ
Local Database Locksโ
Advisory Locks (PostgreSQL)โ
-- Application-level lock, not tied to a row
SELECT pg_advisory_xact_lock(user_id); -- Lock for this transaction
-- OR
SELECT pg_try_advisory_lock(user_id); -- Non-blocking attempt
SELECT FOR UPDATE SKIP LOCKEDโ
-- Worker picks up jobs without blocking other workers
SELECT * FROM jobs
WHERE status = 'PENDING'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED; -- Skip rows locked by other workers
// Spring Data JPA
@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT j FROM Job j WHERE j.status = 'PENDING' ORDER BY j.createdAt LIMIT 10")
List<Job> claimJobs();
Distributed Locking & Coordinationโ
A distributed lock coordinates mutually exclusive work across multiple independent nodes or processes (e.g., preventing two scheduled pods from running the same billing job simultaneously).
Why Naive Locks Failโ
A naive lock implementation like SET lock_key worker-A NX followed by DEL lock_key fails in production due to:
- Worker Crashes: If a worker crashes before deleting the key, the lock is stuck forever. (Mitigation: Add a Lease/TTL).
- GC Pauses / Network Delays: A worker acquires a lock with a 30s TTL. A JVM Garbage Collection (GC) pause halts the worker for 35s. The lock expires, another worker acquires it, and both workers execute the critical section concurrently.
- Clock Skew: Relying on physical system time sync across nodes for lock expiration leads to split ownership.
The Solution: Leases and Fencing Tokensโ
To make distributed locks safe, every lock must return a fencing token (a monotonically increasing number). The target storage system must validate the token on every write:
- Lock service returns token 101 to Worker A.
- Worker A goes into a GC pause. The lock expires.
- Lock service returns token 102 to Worker B.
- Worker B writes to storage (token 102 is recorded as active).
- Worker A wakes up and attempts to write to storage with token 101.
- Storage rejects Worker A's write because
101 < 102.
Implementation A: Redis Distributed Lock (Jedis/Lettuce/Redisson)โ
Redis locking uses SET key value NX PX milliseconds to acquire, and an atomic Lua script to release (ensuring a worker only deletes the lock if they own it):
public class RedisLock {
private final RedisTemplate<String, String> redisTemplate;
private final String lockKey;
private final String ownerId;
private final long leaseTimeMs;
public boolean tryLock() {
Boolean acquired = redisTemplate.opsForValue()
.setIfAbsent(lockKey, ownerId, leaseTimeMs, TimeUnit.MILLISECONDS);
return Boolean.TRUE.equals(acquired);
}
public void unlock() {
String script = """
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
""";
redisTemplate.execute(
new DefaultRedisScript<>(script, Long.class),
Collections.singletonList(lockKey),
ownerId
);
}
}
RedLock Algorithm & Criticismsโ
To overcome single-point-of-failure issues, the RedLock algorithm acquires locks across N independent Redis nodes (needing a quorum of N/2 + 1 nodes to succeed).
However, distributed systems expert Martin Kleppmann criticized RedLock because:
- It relies on physical clock synchronization (system time) to calculate lease durations, which is unsafe due to clock drift.
- It does not natively issue fencing tokens, meaning it cannot protect against GC-pause concurrency anomalies without storage-level checks.
- Best practice: For strict safety, use ZooKeeper or database advisory locks; for high-throughput, soft coordination, Redis is excellent.
Implementation B: ZooKeeper Distributed Lockโ
ZooKeeper achieves lock safety via ephemeral sequential nodes and watchers. Ephemeral nodes delete themselves automatically if the client's session disconnects, preventing permanent deadlocks.
public class ZooKeeperLock {
private final ZooKeeper zk;
private final String lockPath;
private String currentPath;
public boolean tryLock() throws Exception {
// 1. Create ephemeral sequential node
currentPath = zk.create(lockPath + "/lock-",
new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
while (true) {
// 2. Get all children and sort them
List<String> children = zk.getChildren(lockPath, false);
Collections.sort(children);
// 3. If our node is the smallest, we have the lock
String firstNode = children.get(0);
if (currentPath.endsWith(firstNode)) {
return true;
}
// 4. Otherwise, watch the node immediately preceding ours
int index = children.indexOf(currentPath.substring(lockPath.length() + 1));
String watchPath = lockPath + "/" + children.get(index - 1);
CountDownLatch latch = new CountDownLatch(1);
if (zk.exists(watchPath, event -> {
if (event.getType() == EventType.NodeDeleted) {
latch.countDown();
}
}) != null) {
latch.await(); // Block until the previous node is deleted
}
}
}
public void unlock() throws Exception {
zk.delete(currentPath, -1);
}
}
Implementation C: Kubernetes Lease Coordinationโ
Kubernetes uses the Lease resource in coordination.k8s.io to coordinate leader election and active locks across controller pods:
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
name: lead-election-lock
namespace: default
spec:
holderIdentity: pod-worker-a
leaseDurationSeconds: 15
acquireTime: "2026-06-05T15:00:00Z"
renewTime: "2026-06-05T15:00:10Z"
Pods periodically issue heartbeat updates to the renewTime field. If a leader crashes, its lease expires, and other candidate pods attempt to update holderIdentity with their own ID.
Leader Election Algorithmsโ
When operations are "leader-only," distributed systems elect a single coordinator node using consensus algorithms:
- Bully Algorithm: Nodes broadcast elections to nodes with higher IDs. If no higher ID responds, the caller becomes the leader.
- Ring-based Election: Nodes send election tokens around a logical ring. The node with the highest ID is elected when the token completes the circle.
- Raft/Paxos Election: Term-based candidate voting with randomized timeouts. The candidate obtaining a quorum of votes becomes the leader.
Distributed Coordination Patternsโ
- Barriers: Block all processes until a specified number of participants join.
- Distributed Semaphores: Coordinate access to a pool of
Nshared resources. - Distributed Counters: Maintain a consistent numeric value (e.g., using Redis INCR or Paxos register writes).
How Transactions Work Internallyโ
Transaction Lifecycleโ
BEGIN โ Acquire locks โ Execute statements โ Write to WAL โ Commit โ Release locks
- BEGIN: Start transaction, assign transaction ID
- Acquire locks: Get read/write locks on affected rows
- Execute statements: Apply changes in memory
- Write to WAL: Log changes to Write-Ahead Log (durability)
- Commit: Mark transaction as committed in WAL
- Release locks: Free acquired locks
Write-Ahead Logging (WAL)โ
WAL ensures durability by writing changes to a log before applying them to data files.
Transaction T1:
1. Write "BEGIN T1" to WAL
2. Write "UPDATE accounts SET balance=500 WHERE id=1" to WAL
3. Write "COMMIT T1" to WAL
4. fsync() WAL to disk
5. Apply changes to data files (can be deferred)
Benefits:
- Crash recovery: replay WAL to restore committed transactions
- Checkpointing: periodically flush dirty pages to reduce recovery time
- Replication: stream WAL to replicas
Multi-Version Concurrency Control (MVCC)โ
MVCC allows readers to not block writers and vice versa by maintaining multiple versions of rows.
Row: accounts(id=1, balance=1000)
T1 (READ): Sees balance=1000 at timestamp=100
T2 (UPDATE): Creates new version balance=900 at timestamp=200
T1 (READ): Still sees balance=1000 (its snapshot)
T2 (COMMIT): New version becomes visible to transactions starting after 200
Implementation:
- Each row has
xmin(creation transaction) andxmax(deletion transaction) - Readers see rows where
xminis committed andxmaxis not committed - Old versions are cleaned up by vacuum process
Isolation Level Implementationโ
| Isolation Level | Implementation |
|---|---|
| READ UNCOMMITTED | No locks, reads uncommitted data |
| READ COMMITTED | Acquires write locks, releases after each statement |
| REPEATABLE READ | Snapshot isolation, locks held until commit |
| SERIALIZABLE | Full predicate locking or conflict detection |
Two-Phase Locking (2PL)โ
2PL ensures serializability by acquiring locks in growing phase and releasing in shrinking phase.
Growing phase: Acquire locks, never release
Shrinking phase: Release locks, never acquire
Problem: Can cause deadlocks.
Deadlock Detectionโ
Deadlock occurs when transactions wait for each other in a cycle.
T1: Locks row A, waits for row B
T2: Locks row B, waits for row A
โ Deadlock!
Detection:
- Wait-for graph: edges represent "waits for" relationships
- Cycle detection: find cycles in the graph
- Resolution: abort one transaction in the cycle
Prevention:
- Always acquire locks in consistent order
- Use timeouts
- Use lower isolation levels when possible
Distributed Transactions & Workflowsโ
Distributed systems cannot easily enforce local ACID transactions across database boundaries. Standard distributed transaction protocols and eventual consistency patterns coordinate these workflows:
- Two-Phase Commit (2PC) & Three-Phase Commit (3PC): Synchronous locking protocols to achieve atomic commit across multiple participants.
- Saga Pattern: A sequence of local ACID transactions coordinated via Orchestration (central controller) or Choreography (event-driven reactions), using compensating transactions to semantically reverse steps on failure.
For complete architectural breakdowns, comparative tables, database schemas, Spring Boot orchestrator/choreography examples, and safety invariants, see the dedicated Two-Phase Commit (2PC) & Three-Phase Commit (3PC) Guide and Saga Pattern Guide.
Consensus Algorithmsโ
Paxosโ
Paxos is a consensus algorithm for achieving agreement in a distributed system.
Roles:
- Proposer: Proposes values
- Acceptor: Accepts or rejects proposals
- Learner: Learns the chosen value
Phases:
- Prepare: Proposer sends prepare(n) to acceptors
- Promise: Acceptors promise not to accept proposals < n
- Accept: Proposer sends accept(n, v) to acceptors
- Accepted: Acceptors accept if no higher proposal seen
// Simplified Paxos proposer
class PaxosProposer {
private int proposalNumber;
private String acceptedValue;
public String propose(String value) {
// Phase 1: Prepare
List<Promise> promises = sendPrepare(proposalNumber);
if (promises.size() < quorum) return null;
// Phase 2: Accept
String valueToAccept = getValueFromPromises(promises, value);
List<Accept> accepts = sendAccept(proposalNumber, valueToAccept);
if (accepts.size() >= quorum) {
return valueToAccept;
}
return null;
}
}
Raftโ
Raft is a consensus algorithm designed for understandability.
Roles:
- Leader: Handles all client requests
- Follower: Responds to leader and candidate requests
- Candidate: Campaigns to become leader
Phases:
- Leader election: Followers timeout, become candidates, request votes
- Log replication: Leader appends entries to log, replicates to followers
- Safety: Leader must have all committed entries
// Simplified Raft node
class RaftNode {
private State state = State.FOLLOWER;
private int currentTerm;
private String votedFor;
private List<LogEntry> log;
private int commitIndex;
private int lastApplied;
public void onElectionTimeout() {
state = State.CANDIDATE;
currentTerm++;
votedFor = self;
requestVotes();
}
public void onVoteRequest(VoteRequest req) {
if (req.term > currentTerm ||
(req.term == currentTerm && votedFor == null)) {
votedFor = req.candidateId;
currentTerm = req.term;
return new VoteResponse(true, currentTerm);
}
return new VoteResponse(false, currentTerm);
}
}
Comparisonโ
| Algorithm | Understandability | Performance | Fault Tolerance |
|---|---|---|---|
| Paxos | Complex | High | High |
| Raft | Simple | High | High |
| ZAB | Complex | High | High |
Event Sourcing and CQRSโ
For a comprehensive guide on separating read and write models, synchronization via Domain Events, and Event Sourcing theory, see the centralized CQRS & Event Sourcing page.
Change Data Capture (CDC)โ
For a comprehensive guide on how CDC works, implementations like Debezium, and how it compares to polling, see the Change Data Capture (CDC) page.
Real-World Implementationsโ
PostgreSQLโ
- ACID: Full support with MVCC
- Isolation levels: READ COMMITTED, REPEATABLE READ, SERIALIZABLE
- Distributed transactions: Two-phase commit via
PREPARE TRANSACTION - CDC: Logical replication, WAL streaming
-- Set isolation level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Advisory lock
SELECT pg_advisory_lock(12345);
-- Two-phase commit
BEGIN;
PREPARE TRANSACTION 'my_transaction';
COMMIT PREPARED 'my_transaction';
MySQLโ
- ACID: Full support with InnoDB
- Isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE
- MVCC: Implemented via undo log
- Distributed transactions: XA transactions
-- Set isolation level
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- XA transaction
XA START 'my_transaction';
-- ... statements ...
XA END 'my_transaction';
XA PREPARE 'my_transaction';
XA COMMIT 'my_transaction';
MongoDBโ
- ACID: Multi-document ACID transactions (since 4.0)
- Isolation levels: Snapshot isolation
- Consistency: Tunable consistency (strong vs eventual)
- Conflict resolution: Last-write-wins
// Multi-document transaction
const session = client.startSession();
try {
await session.withTransaction(async () => {
await accountsCollection.updateOne(
{ _id: fromId },
{ $inc: { balance: -amount } },
{ session }
);
await accountsCollection.updateOne(
{ _id: toId },
{ $inc: { balance: amount } },
{ session }
);
});
} finally {
await session.endSession();
}
Cassandraโ
- BASE: Eventual consistency
- Consistency levels: ONE, QUORUM, ALL, LOCAL_QUORUM
- Conflict resolution: Last-write-wins with timestamps
- Lightweight transactions: Paxos-based for linearizable operations
// Consistency level
SimpleStatement query = SimpleStatement.builder("SELECT * FROM users WHERE id = ?")
.setConsistencyLevel(ConsistencyLevel.QUORUM)
.build();
// Lightweight transaction (Paxos)
PreparedStatement prepared = session.prepare(
"INSERT INTO users (id, name) VALUES (?, ?) IF NOT EXISTS"
);
BoundStatement bound = prepared.bind(id, name);
ResultSet result = session.execute(bound);
DynamoDBโ
- BASE: Eventual consistency by default
- Consistency levels: EVENTUAL, STRONG
- ACID: Transactions for multi-item operations
- Conflict resolution: Last-write-wins
// Strong consistent read
GetItemRequest request = GetItemRequest.builder()
.tableName("Users")
.key(Map.of("id", AttributeValue.fromS("123")))
.consistentRead(true)
.build();
// Transaction
TransactWriteItemsRequest transaction = TransactWriteItemsRequest.builder()
.transactItems(
TransactWriteItem.builder()
.update(Update.builder()
.tableName("Accounts")
.key(Map.of("id", AttributeValue.fromS("123")))
.updateExpression("SET balance = balance - :amt")
.expressionAttributeValues(Map.of(":amt", AttributeValue.fromN("100")))
.build())
.build(),
TransactWriteItem.builder()
.update(Update.builder()
.tableName("Accounts")
.key(Map.of("id", AttributeValue.fromS("456")))
.updateExpression("SET balance = balance + :amt")
.expressionAttributeValues(Map.of(":amt", AttributeValue.fromN("100")))
.build())
.build())
.build();
Google Spannerโ
- ACID: True external consistency across regions
- Consistency: Strong consistency via TrueTime
- Isolation: Serializable isolation
- Distributed transactions: Two-phase commit with TrueTime
-- Transaction
BEGIN TRANSACTION;
-- Read with timestamp
SELECT * FROM accounts WHERE id = 1;
-- Write
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
Application Transaction Integration Patternsโ
Optimistic Concurrency Control (OCC) with JPA @Versionโ
Optimistic Concurrency Control is ideal when read-to-write ratios are high and concurrent collisions are rare. It avoids taking locks by checking a version number at commit time:
@Entity
@Table(name = "accounts")
public class Account {
@Id
private Long id;
private BigDecimal balance;
@Version
private Long version; // Incremented automatically by Hibernate on update
}
@Service
public class AccountService {
@Transactional
public void transfer(Long fromId, Long toId, BigDecimal amount) {
Account from = accountRepository.findById(fromId).orElseThrow();
Account to = accountRepository.findById(toId).orElseThrow();
from.setBalance(from.getBalance().subtract(amount));
to.setBalance(to.getBalance().add(amount));
accountRepository.save(from);
accountRepository.save(to);
// If another transaction updated either account in the meantime,
// a database-level version mismatch is detected, and Hibernate
// throws OptimisticLockException, triggering a rollback.
}
}
Pessimistic Concurrency Control (PCC) with JPA @Lockโ
Pessimistic Concurrency Control is preferred under heavy contention. It locks the records immediately upon reading, preventing other writers from accessing them:
@Service
public class AccountService {
@Transactional
@Lock(LockModeType.PESSIMISTIC_WRITE) // Generates SELECT ... FOR UPDATE
public void transfer(Long fromId, Long toId, BigDecimal amount) {
Account from = accountRepository.findById(fromId).orElseThrow();
Account to = accountRepository.findById(toId).orElseThrow();
from.setBalance(from.getBalance().subtract(amount));
to.setBalance(to.getBalance().add(amount));
accountRepository.save(from);
accountRepository.save(to);
}
}
Pros and Consโ
Strong Consistency (Pros & Cons)โ
Pros:
- Guarantees correct data
- Simplifies application logic
- No conflict resolution needed
Cons:
- Lower availability
- Higher latency
- Harder to scale
Eventual Consistency (Pros & Cons)โ
Pros:
- High availability
- Low latency
- Easy to scale
Cons:
- Stale reads possible
- Complex conflict resolution
- Harder to reason about
Outbox Pattern (Pros & Cons)โ
For a complete breakdown of the trade-offs and performance characteristics of the Transactional Outbox Pattern, see the dedicated Transactional Outbox Pattern Guide.
Saga Pattern (Pros & Cons)โ
For a detailed analysis of Saga coordination options (Orchestration vs. Choreography) and their pros/cons, see the dedicated Saga Pattern Guide.
CRDTs (Pros & Cons)โ
Pros:
- Conflict-free merging
- High availability
- No coordination needed
Cons:
- Limited data types
- Metadata overhead
- Complex semantics
Interview Questionsโ
Q: Explain the ACID properties. Can you have a database that satisfies all four?โ
A: ACID means atomicity, consistency, isolation, and durability for transactions. Yes, many relational systems provide all four within a node/transaction scope, though distributed scale can relax guarantees for availability.
Q: What is a lost update and how do you prevent it?โ
A: Lost update happens when concurrent writers overwrite each other silently. Prevent with optimistic version checks, row locks, or serializable transaction boundaries.
Q: What is write skew? How do you detect and prevent it?โ
A: Write skew occurs when concurrent transactions read shared predicates and write different rows, violating a global invariant. Prevent with serializable isolation, explicit locking on invariant rows, or materialized guard rows.
Q: What is the dual-write problem in microservices?โ
A: Dual write is updating DB and publishing event separately, where one can succeed and the other fail. It creates inconsistent state between source of truth and downstream consumers.
Q: What is the transactional outbox pattern?โ
A: Write business data and outbox event in one local DB transaction, then relay events asynchronously. This guarantees no event is published without its corresponding state change.
Q: How do you implement read-your-writes consistency when using read replicas?โ
A: Route post-write reads to primary until replica catches up to the client's commit position. Use LSN/GTID tracking or sticky-session windows.
Q: What is a CRDT? When would you use one?โ
A: CRDTs are data types that merge concurrent updates deterministically without coordination. Use them for collaborative/offline systems where availability and conflict-free sync are priorities.
Q: What is the difference between optimistic and pessimistic concurrency control?โ
A: Optimistic control checks for conflicts at commit and retries on collision; pessimistic control locks resources before mutation. Choose based on contention level and latency sensitivity.
Q: How do you handle conflicts in a multi-master database setup?โ
A: Define deterministic merge policy (for example last-write-wins, field-level merge, or domain-specific resolver) and track causality/version metadata. Surface irreconcilable conflicts for business-level resolution.
Q: What database isolation level prevents phantom reads?โ
A: Serializable isolation prevents phantoms by enforcing full serial equivalence. Some engines also prevent phantoms at repeatable read using predicate/next-key locks.
Q: Explain the two-phase commit protocol and its limitations.โ
A: 2PC coordinates atomic commit across participants via prepare and commit phases. Limitations include blocking behavior, single point of failure, and vulnerability to network partitions.
Q: What is the difference between 2PC and 3PC?โ
A: 3PC adds a pre-commit phase to reduce blocking and improve recovery from coordinator failure, at the cost of higher latency and complexity.
Q: How does MVCC work and what are its benefits?โ
A: MVCC maintains multiple row versions so readers don't block writers. Benefits include higher concurrency, no read locks, and consistent snapshots.
Q: What is a saga pattern and when would you use it?โ
A: Saga breaks long transactions into local transactions with compensating actions. Use for long-running business processes across services where ACID is impractical.
Q: How does change data capture (CDC) work?โ
A: CDC reads database transaction logs and streams changes to consumers. It provides real-time, reliable change streaming without impacting application code. For more details, see the CDC Deep Dive.
Q: What is the difference between event sourcing and traditional state persistence?โ
A: Event sourcing stores state changes as events, enabling replay and audit trails. Traditional persistence stores only current state, losing history.
Q: Explain the CAP theorem in the context of data consistency.โ
A: CAP states that during network partitions, you must choose between consistency (all nodes see same data) and availability (all nodes can respond). You can't have both during partitions.
Q: How do you implement idempotency in distributed systems?โ
A: Use idempotency keys stored in a unique constraint table, or embed version/timestamp in data and check before applying changes.
Q: What is write-ahead logging (WAL) and why is it important?โ
A: WAL writes changes to a log before applying to data files, ensuring durability and enabling crash recovery by replaying the log.
Q: Why does a naive distributed lock (e.g., SET key NX PX) fail in production?โ
A: It fails due to GC pauses or network delays. If a worker is suspended by a JVM GC pause that exceeds the lock's TTL, the lock expires and another worker acquires it. When the first worker resumes, both execute the critical section concurrently. Naive locks also offer no protection against clock drift.
Q: What is a fencing token and how does it prevent concurrency anomalies?โ
A: A fencing token is a monotonically increasing number returned by the lock service (e.g., ZooKeeper's node version). The storage system records the token of the last write. If a client tries to write with a lower/stale token (due to a delay or GC pause), the storage system rejects the write, ensuring safety.
Q: How does the RedLock algorithm work and what are its main criticisms?โ
A: RedLock attempts to acquire locks on a quorum of independent Redis instances (e.g., 3 out of 5). Criticisms (e.g., by Martin Kleppmann) highlight that it relies on physical clocks for lease calculation (which drift and can jump), and it does not natively provide fencing tokens, making it unsafe for systems requiring absolute correctness.
Q: How do you handle a saga where a step succeeds, but its corresponding compensating transaction fails?โ
A: Use exponential backoff with jitter to retry the compensation. If it continues to fail (e.g., due to an error from the external gateway), transition the saga state to MANUAL_INTERVENTION_REQUIRED and route it to an operator queue. Do not discard the state.
Q: How would you design a distributed checkout flow that spans inventory, payment, and shipping?โ
A: Implement a Stateful Saga (Orchestrator). The Order Service acts as the coordinator. It writes the Saga state to its DB, then calls the Inventory Service to reserve stock. If successful, it calls the Payment Service to capture funds. If that succeeds, it calls the Shipping Service. If payment fails, the coordinator triggers compensating steps: releasing reserved stock. Each step uses unique idempotency keys to handle retries safely.
Senior Deep Dive: Advanced Topicsโ
Vector Clocksโ
Vector clocks track causality across distributed nodes.
public class VectorClock {
private Map<String, Long> clock = new HashMap<>();
public void increment(String nodeId) {
clock.merge(nodeId, 1L, Long::sum);
}
public void merge(VectorClock other) {
other.clock.forEach((node, value) ->
clock.merge(node, value, Math::max));
}
public boolean happenedBefore(VectorClock other) {
boolean anyLess = false;
boolean anyGreater = false;
Set<String> allNodes = new HashSet<>();
allNodes.addAll(clock.keySet());
allNodes.addAll(other.clock.keySet());
for (String node : allNodes) {
long thisValue = clock.getOrDefault(node, 0L);
long otherValue = other.clock.getOrDefault(node, 0L);
if (thisValue < otherValue) anyLess = true;
if (thisValue > otherValue) anyGreater = true;
}
return anyLess && !anyGreater;
}
public boolean isConcurrent(VectorClock other) {
return !happenedBefore(other) && !other.happenedBefore(this);
}
}
Lamport Clocksโ
Lamport clocks provide a partial order of events.
public class LamportClock {
private long timestamp = 0;
public synchronized long tick() {
return ++timestamp;
}
public synchronized long update(long receivedTimestamp) {
timestamp = Math.max(timestamp, receivedTimestamp) + 1;
return timestamp;
}
}
Hybrid Logical Clocks (HLC)โ
HLC combines physical and logical clocks for better ordering.
public class HybridLogicalClock {
private long physical;
private long logical;
private long nodeId;
public synchronized HLCTimestamp now() {
long nowPhysical = System.currentTimeMillis();
if (nowPhysical > physical) {
physical = nowPhysical;
logical = 0;
} else {
logical++;
}
return new HLCTimestamp(physical, logical, nodeId);
}
public synchronized HLCTimestamp update(HLCTimestamp received) {
long nowPhysical = System.currentTimeMillis();
if (nowPhysical > physical && nowPhysical > received.physical) {
physical = nowPhysical;
logical = 0;
} else if (received.physical > physical) {
physical = received.physical;
logical = received.logical + 1;
} else if (received.physical == physical) {
logical = Math.max(logical, received.logical) + 1;
}
return new HLCTimestamp(physical, logical, nodeId);
}
}
Distributed Snapshotsโ
Distributed snapshots capture global state consistently.
public class SnapshotCoordinator {
private Map<String, SnapshotState> nodeStates = new ConcurrentHashMap<>();
public void initiateSnapshot(String snapshotId) {
// Send marker to all nodes
nodes.forEach(node -> node.sendMarker(snapshotId));
// Wait for all nodes to acknowledge
awaitAllAcknowledgments(snapshotId);
// Snapshot is complete
System.out.println("Snapshot " + snapshotId + " complete");
}
public void onMarkerReceived(String nodeId, String snapshotId) {
nodeStates.computeIfAbsent(nodeId, k -> new SnapshotState())
.recordMarker(snapshotId);
}
public void onMessageReceived(String nodeId, String snapshotId, Message message) {
nodeStates.computeIfAbsent(nodeId, k -> new SnapshotState())
.recordMessage(snapshotId, message);
}
}
Consistency Models Hierarchyโ
Strong Consistency
โโโ Linearizability
โโโ Sequential Consistency
โโโ Causal Consistency
โโโ Read-Your-Writes
โโโ Monotonic Reads
โโโ Monotonic Writes
โโโ Eventual Consistency
CAP Theorem Revisitedโ
CAP states that during network partitions, you must choose between:
- Consistency: All nodes see same data simultaneously
- Availability: Every request receives a response
- Partition Tolerance: System continues despite network failures
In practice:
- CP systems: Choose consistency over availability (for example, HBase, MongoDB with strong consistency)
- AP systems: Choose availability over consistency (for example, Cassandra, DynamoDB)
- CA systems: Not possible in distributed systems with partitions
PACELC Theoremโ
PACELC extends CAP:
- Partition: Availability vs Consistency (same as CAP)
- Else (no partition): Latency vs Consistency
Partition: Availability vs Consistency
No partition: Latency vs Consistency
Examples:
- Cassandra: AP (availability during partition), else latency (eventual consistency)
- Couchbase: AP (availability during partition), else consistency (strong consistency when no partition)
- HBase: CP (consistency during partition), else consistency (strong consistency always)
Additional Resourcesโ
Booksโ
- "Designing Data-Intensive Applications" by Martin Kleppmann
- "Distributed Systems: Principles and Paradigms" by Andrew Tanenbaum
- "Database System Concepts" by Silberschatz, Korth, and Sudarshan
Papersโ
- "Time, Clocks, and the Ordering of Events in a Distributed System" by Leslie Lamport
- "Paxos Made Simple" by Leslie Lamport
- "In Search of an Understandable Consensus Algorithm" by Diego Ongaro and John Ousterhout
Toolsโ
- Debezium: Open-source CDC platform
- Apache Kafka: Distributed event streaming
- Apache ZooKeeper: Distributed coordination service
- etcd: Distributed key-value store
Standardsโ
- XA: Two-phase commit standard
- JTA: Java Transaction API
- JTS: Java Transaction Service
Best Practicesโ
Transaction Managementโ
- Keep transactions short to reduce lock contention
- Use appropriate isolation levels for your use case
- Handle deadlocks gracefully with retries
- Use read-only transactions for queries
Distributed Transactionsโ
- Prefer sagas over 2PC for long-running transactions
- Use idempotency keys for safe retries
- Implement compensating transactions for rollback
- Monitor transaction latency and failure rates
Eventual Consistencyโ
- Document consistency guarantees clearly
- Use versioning for conflict detection
- Implement read repair for stale data
- Provide consistency controls to users when needed
Outbox Patternโ
- Use ordered primary keys for outbox table
- Implement idempotent event publishing
- Include correlation IDs for tracing
- Clean up published outbox entries regularly
Conflict Resolutionโ
- Choose appropriate conflict resolution strategy
- Track causality with vector clocks
- Surface irreconcilable conflicts to users
- Implement merge UI for collaborative editing
Monitoringโ
- Track transaction latency and throughput
- Monitor deadlock rates and retry counts
- Alert on outbox backlog growth
- Measure consistency lag in eventually consistent systems
Testingโ
- Test concurrent access patterns
- Simulate network partitions
- Test failure scenarios and recovery
- Verify consistency guarantees under load