Change Data Capture (CDC)
Comprehensive guide on Change Data Capture (CDC), detailing how it works, alternatives comparison, implementation patterns with Debezium and Spring, and deep dives for senior engineers.
Comprehensive guide on Change Data Capture (CDC), detailing how it works, alternatives comparison, implementation patterns with Debezium and Spring, and deep dives for senior engineers.
A **consumer group** is a set of consumers that collectively consume a topic's partitions. Each partition is assigned to exactly one consumer within the group.
Consumer Lag measures how far behind a consumer group is from the latest messages in a topic. It is the most critical health metric for any Kafka-based application.
A comprehensive guide to the Dead Letter Queue (DLQ) pattern — covering poison pill handling, retry strategies, alternatives comparison, AWS SQS / Kafka / RabbitMQ implementations, and production deep dives for senior engineers.
A comprehensive guide to preventing duplicate message processing across Kafka, Kafka Streams, RabbitMQ, SQS, and Redis — covering EOS internals, idempotent consumers, and production deduplication patterns for senior engineers.
A comprehensive guide to managing and verifying application configurations, environment variables priority, HashiCorp Vault secrets, Kafka topics, ACLs, and schema registry compatibility at deploy time.
Kafka uses a hash of the message key to determine partition assignment. Understanding this mechanism is essential for ordering guarantees, avoiding hot partitions, and designing correct partition keys.
Without idempotence, the standard retry flow can produce **duplicates**:
**Q1: What are the three layers required for end-to-end exactly-once in Kafka?**
**Q1: Explain Kafka's architecture in 2 minutes.**
**Q1: Walk me through what happens when a producer calls `send()`.**
Producers ──► [ Broker Cluster ] ──► Consumers │ │ │ B1 B2 B3 │ ZooKeeper / KRaft
A complete guide to Kafka brokers — what they are, how storage works, partition leadership, replication, ISR, KRaft vs ZooKeeper, log compaction, performance internals, and production monitoring. Beginner through senior depth.
**Kafka Connect** is a framework for **reliably moving data between Kafka and external systems** (databases, file systems, cloud services) without writing.
A **consumer** reads messages from Kafka topics. Unlike traditional queues (push-based), Kafka consumers **pull** messages at their own pace. This gives.
A complete guide to Kafka exactly-once semantics — delivery guarantees, idempotent producer, transactions, read_committed consumers, Kafka Streams EOS, zombie producer fencing, two-phase commit internals, and production patterns. Beginner through senior depth.
Apache Kafka is a **distributed event streaming platform** designed for high-throughput, fault-tolerant, and scalable real-time data pipelines and streaming.
A **producer** is a client application that publishes (writes) messages to Kafka topics. It is responsible for:
A comprehensive guide to Kafka Streams: from core concepts and internal architecture to stateful processing, failure recovery, and production system design patterns. Built for new learners and senior engineers alike.
A deep-dive into techniques for improving Kafka throughput — covering compression, batching, partitions, consumer parallelism, tuning configs, and their trade-offs.
A **topic** is a named, durable stream of messages in Kafka. Think of it as a logical category or feed where producers write and consumers read.
A comprehensive guide comparing Apache Kafka's legacy ZooKeeper architecture with the modern KRaft (Kafka Raft) metadata mode — covering internal mechanics, failure scenarios, migration strategies, and production deep dives for senior engineers.
Kafka guarantees **total ordering within a partition**. Messages written to the same partition are always consumed in the exact order they were produced.
Guide to asynchronous messaging systems including Kafka, RabbitMQ, SQS, event sourcing, pub/sub patterns, consumer groups, ordering guarantees, and exactly-once semantics.
Consumer lag is the most important consumer metric:
Deep dive into the Confluent Parallel Consumer model for decoupling thread concurrency from partition counts safely.
A **partition** is an ordered, immutable sequence of records (a log) within a topic. Each partition lives on exactly one broker at a time (as leader) and.
A comprehensive guide to tuning Kafka Connect to prevent stop-the-world rebalance storms during routine patching and rolling restarts.
Kafka guarantees ordering within a partition, but single-threaded processing limits throughput. This guide covers four patterns for achieving high throughput while preserving per-key ordering.
The `acks` configuration controls **how many broker acknowledgements the producer requires before considering a send successful**. It directly trades off.
Idempotence protects against duplicates within a session, but it doesn't help when:
A comprehensive guide to the Raft Consensus Algorithm — covering leader election, log replication, safety guarantees, and how it is implemented in Apache Kafka's KRaft metadata mode.
Patterns for delivering real-time data to clients including WebSockets, Server-Sent Events, long polling, short polling, and push notification architectures.
The **replication factor** defines how many copies of each partition exist across the cluster.
Partitions are the unit of parallelism in Kafka. Scaling them is critical for throughput but can break ordering for keyed topics. This guide covers the mechanics, risks, and migration strategies.
Deep-dive into high write throughput techniques — sharding, partitioning, WAL internals, LSM trees, async pipelines, batching, backpressure, idempotency, and distributed transactions — with production Java/Spring code and failure mode analysis.
**Schema Registry** is a centralized repository for managing and validating schemas for Kafka messages. It ensures that producers and consumers agree on the.
A complete guide to the Transactional Outbox Pattern — from the Dual-Write problem for beginners to CDC vs polling internals, at-least-once guarantees, ordering semantics, and production monitoring for senior engineers.
A detailed collection of real interview questions and answers from a Walmart Java Developer interview. Ideal for candidates with 3+ years of experience, covering DSA, Core Java, System Design, Spring Boot, and Kafka.