CS262B Reading Summary

Flexible Update Propagation for Weakly Consistent Replication

Karin Petersen et al.

Summary by Feng Zhou
3/3/2004

Strong points of the paper are:

  1. The version vector is the key data structure in tracks updates. At each server, a vector is used to keep the newest version of updates from all other servers in the system. The prefix property states that all updates prior to the one in the vector is sure to be already incorporated locally.  Therefore, only one version number needs to be recorded for each server.  When propagating changes from S to R, a comparison between the two version vectors is done and delta updates are sent from S to R.
  2. Write stabalizing is important because it guarantees that an update will not need to be reapplied again *locally*. So it is safe for the local site to apply the update to the DB and discard it. However, it does not mean other sites will never need the update info again. So a judicious decision need to be made to when to discard stable writes. The later you discard it, the less like other people will need to do full database transfers.
One major flaw.

The correctness of the Bayou protocol certainly depends on a couple of important assumptions, which the authors didn't make clear in the paper.  For example, one crucial assumption is that reordering of concurrent updates, either conflicting or non-conflicting, will result in the same updates to the database. This mandates "perfect" conflict-resolving methods, which seems hard to find for a lot of applications. Without this, the system will need some way of preventing inconsistent data resulting from differnt conflict-resolving outcomes.
A summary/table/diagram explaing the multitude of stamps/variables in the system will help a lot in helping the reader.
The paper also didn't talk about when to truncate committed logs.  This is an important question. A systematic method will be necessary if predicatable behavior is needed.