Skip to main content

AWS X-Ray

Core concept: X-Ray provides distributed tracing โ€” see how requests flow through your entire application (Lambda โ†’ API Gateway โ†’ DynamoDB โ†’ external services).


Key Conceptsโ€‹

ConceptDescription
TraceEnd-to-end path of a request through all services
SegmentOne service's portion of a trace (e.g., Lambda execution)
SubsegmentWork within a segment (e.g., a DynamoDB call)
AnnotationKey-value pair indexed for filtering/searching
MetadataKey-value pair not indexed โ€” for debugging detail
Service MapVisual graph of all services and their connections
Sampling% of requests to trace (controls cost)

Annotations vs Metadataโ€‹

AnnotationsMetadata
Indexedโœ… Yes โ€” searchableโŒ No
Use forFiltering traces (userId, orderId)Debugging data (full request/response)
TypeString, number, booleanAny (JSON)
Subsegment subsegment = AWSXRay.beginSubsegment("ProcessOrder");
try {
// Annotations are searchable
subsegment.putAnnotation("orderId", orderId);
subsegment.putAnnotation("customerId", customerId);

// Metadata for rich debugging (not searchable)
subsegment.putMetadata("orderDetails", orderMap);

processOrder(orderId);
} catch (Exception e) {
subsegment.addException(e);
throw e;
} finally {
AWSXRay.endSubsegment();
}

Sampling Rulesโ€‹

Default sampling: 5% of requests + 1 req/second reservoir

Custom rules (configured in console or via API):

{
"RuleName": "HighValueOrders",
"Priority": 1,
"ReservoirSize": 10,
"FixedRate": 0.5,
"URLPath": "/orders/*",
"ServiceName": "order-service",
"HTTPMethod": "POST"
}
  • ReservoirSize: requests/second to always trace
  • FixedRate: % of remaining requests to trace
  • Lower Priority number = higher priority

X-Ray Daemonโ€‹

The X-Ray SDK sends trace data to the X-Ray daemon, which buffers and sends to the X-Ray API:

Your App โ†’ X-Ray SDK โ†’ UDP port 2000 โ†’ X-Ray Daemon โ†’ X-Ray API
EnvironmentDaemon Location
LambdaBuilt-in (enable Active Tracing)
ECSSidecar container
EC2Install manually or via User Data
Elastic BeanstalkBuilt-in (enable in config)
# Lambda โ€” enable X-Ray in SAM
MyFunction:
Type: AWS::Serverless::Function
Properties:
Tracing: Active # PassThrough = disabled

Java SDK Integrationโ€‹

Maven Dependencyโ€‹

<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-xray-recorder-sdk-core</artifactId>
<version>2.14.0</version>
</dependency>
<!-- AWS SDK instrumentation -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-xray-recorder-sdk-aws-sdk-v2-instrumentor</artifactId>
<version>2.14.0</version>
</dependency>

Spring Boot Integrationโ€‹

@Configuration
public class XRayConfig {

@Bean
public Filter tracingFilter() {
return new AWSXRayServletFilter("my-spring-service");
}
}

// All AWS SDK calls (DynamoDB, S3, SQS...) are auto-instrumented
// when aws-xray-recorder-sdk-aws-sdk-v2-instrumentor is on classpath

Manual Subsegmentsโ€‹

@Service
public class PaymentService {

public void processPayment(String orderId) {
// Creates a subsegment in the current trace
AWSXRay.createSubsegment("PaymentGateway", (subsegment) -> {
subsegment.putAnnotation("orderId", orderId);

// Call external payment gateway
paymentGateway.charge(orderId);

return null;
});
}
}

What X-Ray Tracesโ€‹

With the SDK and instrumentation:

  • Incoming HTTP requests (via filter)
  • AWS SDK calls (DynamoDB, S3, SQS, SNS, Lambda...)
  • SQL queries (JDBC instrumentation)
  • HTTP client calls (Apache HttpClient, OkHttp)
  • Custom business logic (manual subsegments)

๐ŸŽฏ DVA-C02 Exam Tipsโ€‹

Quick Exam Rules
  • Annotations vs Metadata: Annotations are indexed and can be used in filter expressions (searchable). Metadata is not indexed and is only used to store extra detail.
  • Instrumenting other services: To trace outgoing AWS SDK calls from your application, you must wrap or instrument your AWS SDK clients using the X-Ray SDK interceptors.
  • Sampling: To control costs or trace specific routes more frequently, adjust Sampling Rules.
  • X-Ray Daemon: The X-Ray daemon buffers and sends trace data to the X-Ray API. It can run as a sidecar container in ECS or as a separate process on EC2.
  • X-Ray Groups: Use X-Ray Groups to filter and group traces based on specific criteria.
  • X-Ray Tracing: Use X-Ray Tracing to trace requests through your application.

๐Ÿงช Practice Questionsโ€‹

Q1. A developer needs to search X-Ray traces for all requests where userId = "user-123". What should they use to make this possible?

A) X-Ray Metadata with key userId
B) X-Ray Annotation with key userId
C) CloudWatch Log filter
D) X-Ray Groups

โœ… Answer & Explanation

B โ€” Annotations are indexed and searchable. Metadata is stored with the trace but not indexed and cannot be used in filter expressions. Use putAnnotation("userId", userId).


Q2. A Lambda function has X-Ray Active Tracing enabled. However, DynamoDB calls are not appearing as subsegments. What is missing?

A) X-Ray Daemon is not installed
B) The IAM role lacks xray:PutTraceSegments
C) The X-Ray SDK AWS SDK instrumentor is not on the classpath
D) X-Ray sampling rate is too low

โœ… Answer & Explanation

C โ€” To auto-instrument AWS SDK v2 calls, you need aws-xray-recorder-sdk-aws-sdk-v2-instrumentor on the classpath. Without it, DynamoDB/S3/SQS calls won't appear as subsegments.


Q3. By default, what percentage of requests does X-Ray trace?

A) 100%
B) 10%
C) 5% + 1 request/second reservoir
D) 1%

โœ… Answer & Explanation

C โ€” The default sampling rule traces the first request each second (reservoir) and 5% of all additional requests. This balances visibility with cost.


๐Ÿ”— Resourcesโ€‹

Interview Questions (Senior Level)โ€‹

  1. How do you define sampling strategy across low-traffic critical paths and high-volume commodity endpoints?
  2. What annotation taxonomy would you standardize to support incident debugging without exploding cardinality?
  3. A service map shows healthy latency but users report timeouts. How would you triage observability blind spots?
  4. How do you combine X-Ray with logs and metrics for reliable root-cause analysis in distributed systems?

Short answer guide:

  • Use prioritized sampling rules per endpoint criticality and error budget.
  • Keep searchable annotations bounded and domain-oriented.
  • Validate missing spans, upstream retries, and client-side latency not visible in traces.
  • Correlate trace IDs with structured logs and SLO metrics.