Skip to main content

13 docs tagged with "ddia"

View all tags

Chapter 10: Batch Processing

So far the book has focused on systems that handle requests as they arrive (OLTP) or read/write in real-time. But some of the most important data processing.

Chapter 11: Stream Processing

Batch processing has one problem: **latency**. A job that runs once a day means insights that are 24 hours stale. Stream processing is like a continuous batch.

Chapter 3: Storage and Retrieval

As an application developer, you usually just call your database and trust it to do the right thing. But to choose the right database and tune it properly, you.

Chapter 4: Encoding and Evolution

Applications change over time — requirements evolve, new features are added, bugs are fixed. Your data model must evolve too. But in large systems, you can't.

Chapter 5: Replication

**Replication** means keeping a copy of the same data on multiple machines (connected via a network). Reasons to replicate:

Chapter 6: Partitioning

For very large datasets or very high query throughput, a single machine is not enough. **Partitioning** (also called sharding) breaks the data into.

Chapter 7: Transactions

Real applications are messy — the database can crash, network connections can drop, multiple clients write concurrently, and partial reads of partially updated.

Chapter 9: Consistency and Consensus

Chapter 8 cataloged everything that can go wrong in distributed systems. This chapter asks: **given all those failure modes, what guarantees can we actually.