CS262B Reading Summary

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Summary by Feng Zhou
2/4/2004

Strong points of the paper are:

  1. This is a "domain-specific" file system designed to exploit a lot of charasteristics of its applications.  Some techniques are very interesting and performance/availability results are very encouraging.
  2. Several important factors of the usage pattern is that bandwidth matters more than latency and files are large.  This means all those designs for minimizing control messages and cache-coherence messages can go away, simply because data traffic dominates the network because of large file size and applications don't care about several round trips of messages in opening a file.  This simplifies the design a lot and enables the designers to lean the design more to availability and manageability.
  3. The "record append" semantics is good for a lot of applications.  This is both useful and can be implemented efficiently.  Therefore, it's better than enforcing UNIX file semantics over the network, which has been shown to be at least slow, if not possible.
  4. Leases and primary replica is a good mechanism for implement ordering of updates with good performance.  This is better than letting the master do all the ordering.
One major flaw.

The paper is rather case study-like.  Although useful, most techniques are not general enough.  And few design principles useful to others are discussed.