CS262B Reading Summary
Practical Byzantine Fault Tolerance
Miguel Castro and Barbara Liskov
Summary by Feng Zhou
3/10/2004
Strong points of the paper are:
- The paper presents an algorithm for asynchrony state machine
replication tolerating Byzantine faults. The asynchrony part is
important because synchrony themes are vulnerable to attacks that can
destroys liveness. The asynchrony factor results in the number of
replica required to 3f+1, instead of 2f+1 required for synchrony
systems.
- The basic invariant that the algorithm achieves is that all
unfaulty replica agree on the operations executed and the order of
them, and the clients get correct execution results. The primitive
building block is the Byzantine consensus protocol, which basically
makes sure that in the presence of at most f faulty nodes, all
un-faulty nodes in 3f+1 nodes will agree on one bit of information sent
by a certain node (possibly faulty too). The algorithm in this paper
achieve its goals by introducing a primary to order and multicast all
requests on behalf of the clients. The consensus protocol is used in
various stages to ensure that every replica agrees with each other.
"Checkpointing" is used to limit the size logs. "Views" are used to
deal with a faulty primary.
One major flaw.
I don't understand the read-only optimization. It seems that it
cannot guarantee that all replica see the same service state when
executing the request, and thus may return different results. The
paper isn't clear why this cannot be true.