MIT 6.824 · Lecture 4: Primary-Backup Replication #

Primary-Backup Replication for Fault Tolerance

Goal: reach fault-tolerance

Failures

Replication approaches:

State Transfer: transfer memory.
- primary replica executes the service
- primary send the entire state to backups.
- State maybe too large, slow to transfer over network
Replicated State Machine: just send the external events, transfer operations.
- If same start state, same operations, same order, deterministic, then the same end state.
- Generate less network traffic

Replication level:

Applicaiton state: like GFS.
- Efficient: primary only sends high-level operation to backup
- application must support fault-tolerance, for example, forward operation stream.
Machine level: registers and RAM content.
- forwarding machine events: interrupts, DMA etc.
- modifications to send/receive event stream for machines.

What state (to replicate)?

Primary-Backup sync
cut-over: primary fails, the client should have a machenism to change target(primary -> backup).
anomalies
new replicas

Non-determinstic events:

Each log entry:

Output rule: the primary only can response to client until the log entry send request to backup’s VMM and backup acknowledge it.