Saga Pattern (Distributed Workflows)
Because synchronous coordination protocols (2PC/3PC) scale poorly, modern microservice architectures favor Eventual Consistency using the Saga Pattern.
This guide covers Saga coordination patterns, compensation semantics, Spring Boot code examples, and handling runtime failures in production.
Saga Pattern Overviewโ
A Saga decomposes a distributed transaction into a sequence of local ACID transactions (T_1, T_2, \dots, T_n) on individual service databases. Each local transaction updates the database and publishes an event or message to trigger the next step in the saga.
T1 (Order Created) โโโบ T2 (Stock Reserved) โโโบ T3 (Payment Charged) โโโบ Complete
Compensating Transactionsโ
If a local transaction fails (e.g., payment is declined), the saga must manually reverse previous changes by executing compensating transactions (C_1, C_2, \dots, C_{n-1}) in reverse order.
T1 (Order) โโโบ T2 (Inventory) โโโบ T3 (Payment) โโโบ FAIL (Declined)
โ (Trigger compensation)
T1_Compensate โโโโ T2_Compensate โโโโโโโโโโโโโโโโโโโโโ
[!CAUTION] Compensations are not database rollbacks. They are semantic reverses. For instance, if
T_2reserves stock,C_2releases the stock. IfT_3charges a card,C_3issues a refund. IfT_1inserts a record,C_1might mark it asCANCELLEDrather than physically deleting it, preserving the audit trail.
Saga Coordination Stylesโ
There are two primary patterns for coordinating a Saga: Choreography and Orchestration.
Choreography (Event-Driven)โ
In a choreography, there is no central controller. Each service reacts to events from other services and publishes its own events to trigger subsequent steps.
Choreography Code Example (Inventory Service)โ
@Component
@RequiredArgsConstructor
public class InventoryChoreographer {
private final InventoryRepository inventoryRepository;
private final KafkaTemplate<String, Object> kafkaTemplate;
@KafkaListener(topics = "order-placed-events")
public void onOrderPlaced(OrderPlacedEvent event) {
try {
inventoryRepository.reserve(event.getOrderId(), event.getItems());
kafkaTemplate.send("inventory-reserved-events", new InventoryReservedEvent(event.getOrderId()));
} catch (InsufficientStockException e) {
kafkaTemplate.send("inventory-failed-events", new InventoryFailedEvent(event.getOrderId()));
}
}
}
- Pros: Simple to start; highly decoupled services.
- Cons: Hard to visualize the workflow; risk of cyclic dependencies; debugging requires complex distributed tracing.
Orchestration (Central Coordinator)โ
In an orchestration, a dedicated service acts as the Orchestrator (the brain). It issues commands to participants, waits for responses, and coordinates the execution of tasks and compensations.
Orchestrator Code Exampleโ
@Service
@RequiredArgsConstructor
public class OrderSagaOrchestrator {
private final OrderRepository orderRepository;
private final InventoryClient inventoryClient;
private final PaymentClient paymentClient;
@Transactional
public void executeSaga(CreateOrderCommand cmd) {
Order order = orderRepository.save(Order.create(cmd));
try {
// Step 1: Reserve Stock
inventoryClient.reserve(order.getId(), cmd.getItems());
// Step 2: Process Payment
paymentClient.charge(order.getId(), cmd.getPaymentInfo());
order.markCompleted();
} catch (InventoryException ex) {
// Step 1 failed, abort order
order.markFailed("Stock reservation failed");
} catch (PaymentException ex) {
// Step 2 failed, trigger compensation for Step 1
inventoryClient.release(order.getId(), cmd.getItems());
order.markFailed("Payment failed: " + ex.getMessage());
}
orderRepository.save(order);
}
}
- Pros: Centralized visibility; easy to test and debug; clear separation of business workflows.
- Cons: Coordinator is a single point of failure (SPOF); risk of centralization anti-pattern where the orchestrator owns too much business logic.
Saga Escalation Playbookโ
If a compensating transaction fails (e.g., inventory client throws a timeout while trying to release reserved stock), the saga coordinator cannot automatically resolve. You must build an escalation pipeline:
- Exponential Retry: Retry the compensation step with exponential backoff and jitter.
- Transition to Manual Alerting: If retries are exhausted, set the saga state to
MANUAL_INTERVENTION_REQUIREDand publish a critical alert. - Dedicated Admin Dashboard: Provide operators an administrative UI to manually resolve or retry compensation tasks.