AWS Step Functions
Core concept: Step Functions orchestrate multi-step workflows using state machines โ coordinate Lambda, SQS, DynamoDB, ECS, and 200+ AWS services.
Standard vs Express Workflowsโ
| Feature | Standard | Express |
|---|---|---|
| Max duration | 1 year | 5 minutes |
| Execution model | Exactly-once | At-least-once |
| Execution history | Full history in console | CloudWatch Logs only |
| Pricing | Per state transition | Per execution + duration |
| Use case | Long-running business processes | High-volume, short workflows |
State Typesโ
| State | Purpose |
|---|---|
Task | Do work (invoke Lambda, call API, etc.) |
Choice | Branch based on conditions |
Wait | Pause for a duration or until a timestamp |
Parallel | Execute branches simultaneously |
Map | Iterate over an array |
Pass | Pass input to output (for testing/transformation) |
Succeed | End the workflow successfully |
Fail | End the workflow with an error |
State Machine Definition (ASL)โ
{
"Comment": "Order Processing Workflow",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:ValidateOrder",
"Next": "ProcessPayment",
"Catch": [{
"ErrorEquals": ["ValidationError"],
"Next": "SendFailureNotification"
}]
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:ProcessPayment",
"Retry": [{
"ErrorEquals": ["States.TaskFailed"],
"IntervalSeconds": 2,
"MaxAttempts": 3,
"BackoffRate": 2.0
}],
"Next": "IsPaymentApproved"
},
"IsPaymentApproved": {
"Type": "Choice",
"Choices": [{
"Variable": "$.paymentStatus",
"StringEquals": "APPROVED",
"Next": "FulfillOrder"
}],
"Default": "SendPaymentFailed"
},
"FulfillOrder": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "UpdateInventory",
"States": {
"UpdateInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:UpdateInventory",
"End": true
}
}
},
{
"StartAt": "SendConfirmationEmail",
"States": {
"SendConfirmationEmail": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:SendEmail",
"End": true
}
}
}
],
"End": true
}
}
}
Error Handlingโ
Retryโ
"Retry": [{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2.0 // 1s, 2s, 4s
}]
Catchโ
"Catch": [{
"ErrorEquals": ["PaymentDeclined"],
"ResultPath": "$.error", // Preserve error info
"Next": "HandlePaymentError"
}]
Wait for Callback Patternโ
For human approval or external system responses:
"WaitForHumanApproval": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"Parameters": {
"FunctionName": "SendApprovalEmail",
"Payload": {
"taskToken.$": "$$.Task.Token",
"orderId.$": "$.orderId"
}
},
"TimeoutSeconds": 86400,
"Next": "ProcessApproval"
}
The Lambda sends a taskToken to the approver. They call SendTaskSuccess / SendTaskFailure to resume the workflow.
Map State (Parallel Processing)โ
"ProcessAllOrders": {
"Type": "Map",
"ItemsPath": "$.orders",
"MaxConcurrency": 10,
"Iterator": {
"StartAt": "ProcessSingleOrder",
"States": {
"ProcessSingleOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:ProcessOrder",
"End": true
}
}
},
"Next": "SendSummary"
}
๐งช Practice Questionsโ
Q1. A workflow needs to process each item in a list in parallel, up to 5 items at a time. Which state type achieves this?
A) Parallel state
B) Choice state with conditions
C) Map state with MaxConcurrency: 5
D) Multiple Task states
โ Answer & Explanation
C โ The Map state iterates over an array, applying the same workflow to each item. MaxConcurrency controls parallelism. Parallel runs different branches simultaneously, not the same branch for each item.
Q2. A payment workflow needs to pause and wait for a manual approval that may come hours later via an API call. Which pattern enables this?
A) Wait state with a fixed duration
B) Poll DynamoDB every minute for approval
C) Task state with .waitForTaskToken (Callback pattern)
D) Choice state polling an SQS queue
โ Answer & Explanation
C โ The Callback pattern (waitForTaskToken) pauses the workflow indefinitely. An external system calls SendTaskSuccess or SendTaskFailure with the token to resume. No polling, no fixed wait time.