Distributed Tracing
In a monolithic application, tracing a request is straightforward since all calls execute on a single execution stack. In a microservices environment, a single user request can propagate through dozens of services across network boundaries. Distributed Tracing provides visibility into the complete journey of a request as it crosses process boundaries.
How It Works: Spans and Traces
Distributed tracing coordinates two key data concepts:
Trace (Global Request Journey - Trace ID: abc123xyz)
โโ Span A: Gateway (Client Request Received) [Duration: 100ms]
โ โโ Span B: Order Service (Create Order DB) [Duration: 40ms]
โ โโ Span C: Payment Service (Charge Call) [Duration: 50ms]
โ โโ Span D: Stripe API (External Network Hop) [Duration: 30ms]
- Trace: The complete end-to-end journey of a request. It is represented by a unique Trace ID (
traceId) generated by the first service that intercepts the request (typically the API Gateway). - Span: A single logical unit of work (e.g., an HTTP request, a database query, or a message publish). Each span has a Span ID, a parent Span ID, a timestamp, and a duration.
- Trace Context Propagation: To pass the
traceIdand activespanIdacross network borders, HTTP and Kafka calls inject metadata headers (most commonly using the W3C TraceContext standard:traceparentheader).
Setup & Implementation (Spring Boot 3 + Micrometer Tracing)
In Spring Boot 3, Micrometer Tracing (which integrates with OpenTelemetry) handles context propagation automatically.
1. Add Dependencies
<!-- pom.xml -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
2. Configure Propagation and Exporting
# application.yml
management:
tracing:
sampling:
probability: 1.0 # Sample 100% of requests for debugging (use ~0.05-0.1 in high-traffic production)
otlp:
tracing:
endpoint: http://jaeger-collector.monitoring:4318/v1/traces # Send spans to Jaeger
3. Java Code Example: Manual Span Instrumentation
While Spring Boot automatically traces incoming and outgoing HTTP REST calls, you can define custom spans for critical business logic:
@Service
@Slf4j
public class OrderProcessingService {
private final Tracer tracer;
public OrderProcessingService(Tracer tracer) {
this.tracer = tracer;
}
public void processComplexOrder(Order order) {
// Create and start a custom child span
Span customSpan = tracer.nextSpan().name("complex-validation").start();
try (Tracer.SpanInScope ws = tracer.withSpan(customSpan)) {
// Tag the span with useful context
customSpan.tag("order.id", String.valueOf(order.getId()));
customSpan.tag("customer.tier", order.getCustomerTier());
// Run business logic
validateOrderConstraints(order);
log.info("Validating order rules within custom span context");
} catch (Exception e) {
customSpan.error(e);
throw e;
} finally {
customSpan.end(); // Make sure to close the span
}
}
}
Log Correlation
To tie trace metrics to text output, configure your logging layout to print the active Trace ID and Span ID on every line:
<!-- logback-spring.xml -->
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level [traceId=%X{traceId}, spanId=%X{spanId}] %logger{36} - %msg%n</pattern>
This generates output matching the pattern:
2026-07-03 10:15:30.123 [http-nio-8080-exec-1] INFO [traceId=d22f03f7e53a2ba1, spanId=b9a8cf5c66e2c342] c.e.o.OrderService - Saving order database entry
Pros vs. Cons
| Pros | Cons |
|---|---|
| Rapid Diagnostics: pinpoints precisely which service in a call graph is throwing errors or adding latency. | Storage Overhead: Tracing generates massive volumes of data; storing every trace is expensive. |
| Dependency Graphing: Automatically builds service-to-service dependency topologies for architectural auditing. | Context Propagation Fragility: If any service fails to forward the traceparent headers, the trace is broken. |
| Performance Bottleneck Detection: Highlights slow database queries or blocking client HTTP calls within a request flow. | Code Intrusion: Instrumenting legacy systems or custom protocols requires complex manual setup. |
Common Gotchas & Anti-Patterns
- Broken Trace Chains: Forgetting to pass trace headers when spawning asynchronous threads manually in Java (e.g., using
Runnableor customExecutorService). The child threads will execute under a brand newtraceIdor without context.- Solution: Use Micrometer's
ContextExecutorServiceto wrap thread pools.
- Solution: Use Micrometer's
- Sampling Errors: Sampling 100% of traffic in a high-scale production system. This generates massive network traffic and storage bills. Use adaptive sampling or head/tail-based sampling instead.
- Mismatched Propagation Headers: API Gateway sends W3C traceparent headers (
00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01), but downstream services are configured to look for Zipkin B3 headers (X-B3-TraceId). This causes traces to split. Standardize on W3C.