CS262B Reading Summary

Scalable, Distributed Data Structures for Internet Service Construction

Steven D. Gribble et al.

Summary by Feng Zhou
2/1/2004

Strong points of the paper are:

  1. Even today, a flexible global data management facility for clusters is a much needed service for cluster-based Internet service.  Neither databases nor (distributed) file systems are a good fit for this.  Databases don't scale, and a centralized one offers poor availability.  File systems provide poor atomicity and consistency support.  Therefore distributed data structures are a good candidate for this use.
  2. The consistency strategy (optimistic 2-phase commit) is a good choice for the cluster environment.  For example, replica talk to each other to agree on committing or aborting, instead of waiting for the manager to recover, when the manager crashes. This exploits the fact that all replica are on the same LAN and connected. Another useful decision is to remove a replica from the replica group when it crashes, instead of waiting for it to recover. This increases availability without losing data consistency.
  3. The technique use to maintain consistency of metadata maps is useful. The DDS library piggybacks hashes of maps to the bricks with every command, which verify whether they are up-to-date, because carrying out any operation. This means the DDS library instances do not need to maintain an up-to-date version of the maps and do not need to be notified with every update to these maps.
One major flaw.

DDS does not provide transaction support. This limits its usage to non-mission-critical applications, or the applications must provide ACID support by themselves, which is a daunting job for app. developers.