Amazon SQS (Simple Queue Service)
Core concept: SQS decouples producers from consumers. Messages wait safely in the queue even if the consumer is down or slow.
๐ฐ What Is SQS?โ
SQS is a fully managed message queuing service. Think of it as a post office โ senders drop letters (messages) in the mailbox (queue), and recipients pick them up at their own pace.
Why decouple? If your web server calls a payment service directly and the payment service is slow, your users wait. With SQS, the web server drops a message in the queue and responds immediately โ the payment service processes it asynchronously.
Standard vs FIFO Queueโ
| Feature | Standard | FIFO |
|---|---|---|
| Throughput | Unlimited | 300 TPS (3,000 with batching, 70K with high throughput) |
| Ordering | Best-effort | Strict FIFO (within message group) |
| Delivery | At-least-once (possible duplicates) | Exactly-once processing |
| Deduplication | โ | โ (5-minute dedup window) |
| Message Groups | โ | โ (parallel processing per group) |
| Naming | Any | Must end in .fifo |
FIFO Message Group IDโ
Queue: orders.fifo
โโโ GroupID: "customer-A" โ Messages processed in order for customer A
โโโ GroupID: "customer-B" โ Messages processed in order for customer B
โโโ GroupID: "customer-C" โ Messages processed in order for customer C
โ Different groups process IN PARALLEL
FIFO Deduplicationโ
| Method | How |
|---|---|
| Content-based | SHA-256 hash of body (enable on queue) |
| MessageDeduplicationId | You provide a unique ID per message |
sqsClient.sendMessage(SendMessageRequest.builder()
.queueUrl("https://sqs.../orders.fifo")
.messageBody("{\"orderId\": \"ORD-123\", \"action\": \"process\"}")
.messageGroupId("customer-A")
.messageDeduplicationId("ORD-123-process") // Prevents duplicates in 5 min window
.build());
- Financial transactions โ FIFO (order matters, no duplicates)
- Log processing โ Standard (order doesn't matter, high throughput)
- IoT telemetry โ Standard (volume matters more than order)
- Order processing per customer โ FIFO with MessageGroupId = customerId
Key Parametersโ
| Parameter | Default | Max | Description |
|---|---|---|---|
| Message retention | 4 days | 14 days | How long messages stay in queue |
| Visibility timeout | 30 sec | 12 hours | Message hidden from others during processing |
| Max message size | 256 KB | 256 KB | Use SQS Extended Client for larger |
| Delivery delay | 0 sec | 15 min | Delay before message becomes visible |
| Receive wait time | 0 sec | 20 sec | Long polling duration |
| Max receive count | โ | โ | Before sending to DLQ |
Visibility Timeoutโ
Producer โ [Message in Queue]
โ
Consumer receives message โ message INVISIBLE for 30s (default)
โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Consumer finishes in < 30s? โ
โ โ
Delete message โ DONE โ
โ โ Timeout โ message reappears โ
โ โ ANOTHER consumer picks up โ
โ โ DUPLICATE PROCESSING! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Setting Visibility Timeoutโ
Rule of thumb: Set visibility timeout โฅ 6ร your average processing time
// Extend visibility timeout if processing takes longer
sqsClient.changeMessageVisibility(ChangeMessageVisibilityRequest.builder()
.queueUrl(queueUrl)
.receiptHandle(message.receiptHandle())
.visibilityTimeout(120) // Extend to 2 minutes
.build());
When Lambda processes SQS via ESM, set visibility timeout โฅ 6ร Lambda timeout. Lambda auto-extends visibility for long-running functions, but the initial setting matters.
Dead Letter Queue (DLQ)โ
Amazon SQS supports dead letter queues (DLQs) to isolate unprocessable messages (poison pills) from healthy source queues.
For the complete design principles, retention strategies, and exact CLI redrive instructions (start-message-move-task), see the centralized AWS SQS DLQ & Redrive section.
Long Polling vs Short Pollingโ
| Type | Behavior | Cost | API Calls |
|---|---|---|---|
| Short (default) | Returns immediately (empty or not) | Higher | Many empty responses |
| Long | Waits up to 20s for a message | Lower | Fewer calls |
// Long polling at queue level (all consumers)
sqsClient.setQueueAttributes(SetQueueAttributesRequest.builder()
.queueUrl(queueUrl)
.attributes(Map.of(QueueAttributeName.RECEIVE_MESSAGE_WAIT_TIME_SECONDS, "20"))
.build());
// Or per-request long polling
ReceiveMessageResponse response = sqsClient.receiveMessage(ReceiveMessageRequest.builder()
.queueUrl(queueUrl)
.waitTimeSeconds(20) // Long poll for up to 20s
.maxNumberOfMessages(10) // Batch up to 10 messages
.build());
Long polling reduces SQS API costs and latency. Set WaitTimeSeconds > 0.
Lambda Integration (Event Source Mapping)โ
SQS Queue โ Lambda ESM (managed polling) โ Lambda Function
Batch Processing with Partial Failuresโ
public class OrderProcessor implements RequestHandler<SQSEvent, SQSBatchResponse> {
public SQSBatchResponse handleRequest(SQSEvent event, Context context) {
List<SQSBatchResponse.BatchItemFailure> failures = new ArrayList<>();
for (SQSEvent.SQSMessage msg : event.getRecords()) {
try {
Order order = parseOrder(msg.getBody());
processOrder(order);
// Success โ message will be deleted from queue
} catch (Exception e) {
context.getLogger().log("Failed: " + msg.getMessageId());
failures.add(SQSBatchResponse.BatchItemFailure.builder()
.withItemIdentifier(msg.getMessageId())
.build());
// Only THIS message returns to queue
}
}
return SQSBatchResponse.builder()
.withBatchItemFailures(failures)
.build();
}
}
ESM Configurationโ
MyFunction:
Type: AWS::Serverless::Function
Properties:
Events:
SQSEvent:
Type: SQS
Properties:
Queue: !GetAtt MyQueue.Arn
BatchSize: 10
MaximumBatchingWindowInSeconds: 5 # Wait up to 5s to fill batch
FunctionResponseTypes:
- ReportBatchItemFailures # MUST enable for partial failures
SQS Extended Client (Large Messages)โ
For messages >256KB, store payload in S3:
// Producer stores large payload in S3, sends reference via SQS
AmazonSQSExtendedClient extendedSqsClient = new AmazonSQSExtendedClient(
sqsClient,
new ExtendedClientConfiguration()
.withPayloadSupportEnabled(s3Client, "sqs-large-payloads-bucket")
.withAlwaysThroughS3(false) // Only use S3 for messages > 256KB
);
// Consumer automatically downloads from S3
Patternsโ
Fan-Out: SNS + SQSโ
Event โ SNS Topic โ SQS Queue 1 (service A)
โ SQS Queue 2 (service B)
โ SQS Queue 3 (service C)
Throttling: SQS as Bufferโ
API Gateway (burst) โ SQS Queue โ Lambda (controlled concurrency)
Set Lambda ESM's MaximumConcurrency to control processing rate.
๐ฏ DVA-C02 Exam Tipsโ
- Visibility timeout โฅ 6ร processing time (prevents duplicates)
- FIFO = guaranteed order + exactly-once. Standard = unlimited throughput
- DLQ must be same type as source queue
- Long polling reduces API costs (WaitTimeSeconds > 0)
- ReportBatchItemFailures = partial batch failure handling
- Max message size = 256 KB. Use Extended Client for larger
- Message retention = max 14 days
- FIFO MessageGroupId enables parallel processing per group
- Delay Queue = delay message visibility up to 15 minutes
- Lambda + FIFO = max 1 Lambda per message group
๐งช Practice Questionsโ
Q1. Lambda takes 45s, visibility timeout is 30s. What happens?
A) Lambda timeout
B) Message reappears, may be processed twice
C) Lambda auto-extends timeout
D) Message goes to DLQ
โ Answer & Explanation
B โ Visibility timeout expires at 30s while Lambda is still processing. Message becomes visible and another consumer can pick it up โ duplicate processing.
Q2. Prevent duplicate messages within 5 minutes. Which feature?
A) Standard queue
B) FIFO queue deduplication
C) Visibility timeout
D) DLQ
โ Answer & Explanation
B โ FIFO deduplication (MessageDeduplicationId or content-based) prevents duplicates within 5-minute window.
Q3. Reduce costs polling empty queue. What to configure?
A) Increase retention
B) Enable FIFO
C) Long polling (WaitTimeSeconds > 0)
D) Reduce batch size
โ Answer & Explanation
C โ Long polling waits up to 20s for messages, reducing empty-response API calls.
Q4. Lambda processes 10 SQS messages. 1 fails. How to retry only the failed one?
A) Set maxReceiveCount to 1
B) Enable ReportBatchItemFailures, return failed messageId
C) Catch exception and ignore
D) Use FIFO queue
โ Answer & Explanation
B โ With ReportBatchItemFailures, return only the failed messageId. The 9 successful messages are deleted; only the failed one returns to the queue.