Advanced Consensus and BFT
Raft/Paxos assume nodes fail by crashing. BFT protocols handle arbitrary or malicious behavior.
Beginner Viewโ
Crash Fault vs Byzantine Faultโ
- Crash fault: node stops responding
- Byzantine fault: node responds incorrectly, inconsistently, or maliciously
Most enterprise systems use crash-fault consensus (Raft/Paxos). BFT is used only when trust assumptions are weaker.
Replica Count Intuitionโ
- Crash fault tolerance: usually
2f + 1replicas forffaults - Byzantine fault tolerance: usually
3f + 1replicas forfByzantine faults
This is why BFT is more expensive.
Senior Deep Diveโ
Why BFT Costs Moreโ
- More message rounds for agreement
- Signature/verification overhead
- Higher network fan-out and latency
Protocol Familiesโ
- PBFT: classic three-phase protocol; high communication overhead
- HotStuff: pipeline-friendly design reducing protocol complexity
- Tendermint-style: practical BFT for validator-based systems
Decision Frameworkโ
Use crash-fault consensus when:
- Single organization control plane
- Strongly authenticated infra and low adversarial risk
Use BFT when:
- Multi-organization governance
- Adversarial environment cannot be ignored
- Cost of inconsistent/malicious state is catastrophic
Failure Models and Risk Mappingโ
| Environment | Recommended model | Reason |
|---|---|---|
| Internal service registry | Crash fault | Trusted infra, lower cost |
| Cross-company settlement network | BFT | Independent trust domains |
| Public validator network | BFT | Adversarial participants expected |
Operational Considerationsโ
- Benchmark consensus latency under realistic geo RTT
- Use hardware crypto acceleration if signature-heavy
- Define quorum-loss runbooks and emergency governance flows
- Continuously test node equivocation/Byzantine simulation in staging
Interview Questionsโ
Q: Why can Raft not handle Byzantine faults by design?โ
A: Raft assumes fail-stop or crash faults and honest message behavior. If nodes lie or equivocate, Raft's quorum logic cannot guarantee safety.
Q: Explain why BFT generally needs 3f + 1 replicas.โ
A: To tolerate f Byzantine nodes, the system needs enough honest overlap between quorums. With 3f+1 replicas, at least 2f+1 can agree, ensuring quorum intersection includes honest nodes.
Q: When is BFT over-engineering for enterprise systems?โ
A: If nodes are under one trusted operator and threat model is mostly crashes/outages, crash-fault consensus is usually enough. BFT cost is rarely justified without adversarial trust boundaries.
Q: Compare PBFT and HotStuff at a high level.โ
A: PBFT uses multiple communication phases with heavier view-change complexity. HotStuff streamlines leader change with a pipelined three-phase protocol and simpler proofs.
Q: How do trust assumptions drive consensus choice?โ
A: If participants can be malicious, choose BFT; if they are trusted but can crash, choose CFT like Raft/Paxos. Consensus should match the strongest realistic failure mode.
Q: What are practical performance bottlenecks in BFT systems?โ
A: Signature verification, all-to-all messaging, and WAN latency dominate. Performance degrades quickly with replica count unless batching and crypto acceleration are used.
Q: How would you justify BFT to a product team concerned about latency?โ
A: Frame BFT as risk reduction for high-value, multi-party trust domains where incorrect commits are catastrophic. Then scope BFT to critical write paths and keep read paths optimized separately.
Q: What staging tests would you run for Byzantine behavior?โ
A: Inject equivocation, forged signatures, delayed/reordered messages, and split views under load. Verify safety invariants (no conflicting commits) and bounded recovery time.