CS262B Reading Summary

Practical Byzantine Fault Tolerance

Miguel Castro and Barbara Liskov

Summary by Feng Zhou
3/10/2004

Strong points of the paper are:

  1. The paper presents an algorithm for asynchrony state machine replication tolerating Byzantine faults. The asynchrony part is important because synchrony themes are vulnerable to attacks that can destroys liveness. The asynchrony factor results in the number of replica required to 3f+1, instead of 2f+1 required for synchrony systems.
  2. The basic invariant that the algorithm achieves is that all unfaulty replica agree on the operations executed and the order of them, and the clients get correct execution results. The primitive building block is the Byzantine consensus protocol, which basically makes sure that in the presence of at most f faulty nodes, all un-faulty nodes in 3f+1 nodes will agree on one bit of information sent by a certain node (possibly faulty too). The algorithm in this paper achieve its goals by introducing a primary to order and multicast all requests on behalf of the clients. The consensus protocol is used in various stages to ensure that every replica agrees with each other. "Checkpointing" is used to limit the size logs. "Views" are used to deal with a faulty primary.
One major flaw.

I don't understand the read-only optimization. It seems that it cannot guarantee that all replica see the same service state when executing the request, and thus may return different results.  The paper isn't clear why this cannot be true.