Skip to main content

Amazon Kinesis

Core concept: Kinesis handles real-time streaming data โ€” logs, metrics, IoT events, clickstreams.


Kinesis Services Comparisonโ€‹

ServicePurposeRetentionConsumers
Data StreamsReal-time stream processing1โ€“365 daysCustom (Lambda, KCL, SDK)
Data FirehoseLoad streaming data to destinationsNo retentionManaged destinations only
Data AnalyticsSQL / Apache Flink on streamsN/AOutput to streams/destinations

Kinesis Data Streamsโ€‹

Shardsโ€‹

  • 1 shard = 1 MB/s write, 2 MB/s read, 1,000 records/s
  • Add shards to scale (Shard Splitting)
  • Remove shards to reduce cost (Shard Merging)

Partition Keysโ€‹

// Records with same partition key โ†’ same shard (ordered within shard)
PutRecordRequest request = PutRecordRequest.builder()
.streamName("clickstream")
.data(SdkBytes.fromUtf8String(jsonData))
.partitionKey(userId) // Same userId โ†’ same shard โ†’ ordered
.build();

Consumersโ€‹

TypeRead throughputDescription
Standard2 MB/s shared across all consumersPull-based, cheaper
Enhanced Fan-Out2 MB/s per consumer per shardPush-based, dedicated throughput

Lambda ESM for Kinesisโ€‹

  • Lambda polls shards automatically
  • Bisect on error: splits failed batches
  • Tumbling windows: aggregate records over a time window
  • Iterator position: TRIM_HORIZON (from beginning) or LATEST

Kinesis vs SQSโ€‹

AspectKinesis Data StreamsSQS
OrderPer-shard orderingFIFO only
Multiple consumersโœ… All consumers read the same streamโŒ One consumer per message
Replayโœ… Up to retention periodโŒ Message deleted after processing
Real-timeโœ… Sub-secondNear real-time
ProvisioningManual (shards)Automatic
Use caseAnalytics, metrics, logsTask queues, job processing
Kinesis vs SQS keyword triggers
  • "Multiple applications consume same data" โ†’ Kinesis
  • "Replay data from the past" โ†’ Kinesis
  • "Job processing, decoupling" โ†’ SQS
  • "Exactly-once, ordering required, no replay" โ†’ SQS FIFO

Kinesis Data Firehoseโ€‹

  • Fully managed โ€” no shards to manage
  • Destinations: S3, Redshift, OpenSearch, Splunk, HTTP endpoints
  • Buffering: by size (1โ€“128 MB) or time (60โ€“900 seconds)
  • Can transform data with Lambda before delivery
  • Near-real-time (not real-time โ€” has buffer delay)

๐Ÿงช Practice Questionsโ€‹

Q1. An IoT platform generates 5 MB/s of sensor data. Multiple analytics applications need to read the same data simultaneously and replay data from the last 7 days. What service fits best?

A) SQS Standard
B) SQS FIFO
C) Kinesis Data Streams
D) SNS

โœ… Answer & Explanation

C โ€” Kinesis Data Streams supports multiple consumers reading the same data independently, with data retention (configurable up to 365 days) enabling replay. SQS doesn't support multiple consumers or replay.


Q2. You need to load streaming clickstream data into S3 every 5 minutes for batch analytics. Which service requires the least operational overhead?

A) Kinesis Data Streams + custom Lambda
B) Kinesis Data Firehose
C) SQS + Lambda
D) Kinesis Data Analytics

โœ… Answer & Explanation

B โ€” Kinesis Data Firehose is fully managed, handles buffering and S3 delivery natively, with no shards or consumers to manage. It's purpose-built for this use case.


๐Ÿ”— Resourcesโ€‹