Designing Data-Intensive Applications
The Big Ideas Behind Reliable, Scalable, and Maintainable Systems β Martin Kleppmann (O'Reilly, 2017)
What Is This Book About?β
Modern applications are not compute-intensive (CPU is rarely the bottleneck) β they are data-intensive. The real challenges are:
- The volume of data
- The complexity of data
- The speed at which data changes
This book cuts through the buzzwords (NoSQL, Big Data, CAP theorem, eventual consistencyβ¦) and explains the engineering principles behind the tools, so you can make smart architectural decisions.
Book Structureβ
The book is divided into three parts, covering 12 chapters:
π¦ Part I β Foundations of Data Systemsβ
Covers ideas that apply to any data system, whether on a single machine or a cluster.
| Chapter | Topic |
|---|---|
| Chapter 1 | Reliable, Scalable, and Maintainable Applications |
| Chapter 2 | Data Models and Query Languages |
| Chapter 3 | Storage and Retrieval |
| Chapter 4 | Encoding and Evolution |
π Part II β Distributed Dataβ
What happens when data is spread across multiple machines β for scale and fault tolerance.
| Chapter | Topic |
|---|---|
| Chapter 5 | Replication |
| Chapter 6 | Partitioning |
| Chapter 7 | Transactions |
| Chapter 8 | The Trouble with Distributed Systems |
| Chapter 9 | Consistency and Consensus |
π Part III β Derived Dataβ
Systems that transform and combine datasets to produce new ones.
| Chapter | Topic |
|---|---|
| Chapter 10 | Batch Processing |
| Chapter 11 | Stream Processing |
| Chapter 12 | The Future of Data Systems |
Who Should Read This?β
- Backend / platform engineers who store and process data at scale
- Software architects choosing between databases, queues, and processing frameworks
- Technical leads who need to reason about trade-offs in distributed systems
You should be comfortable with SQL and basic backend development. Everything else is explained from first principles.
Key Themesβ
Reliability β Working correctly even when things go wrong
Scalability β Handling growth in data, traffic, and complexity
Maintainability β Being easy to work on over time by different teams
These three properties appear in every chapter and tie the whole book together.
Quick Navigationβ
- Start DDIA: Chapter 1 - Reliable, Scalable, and Maintainable Applications
- Jump to distributed systems: Chapter 8 - The Trouble with Distributed Systems
- Switch book: Clean Code Introduction
- Switch book: Effective Java Introduction