CS262B Reading Summary
The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Summary by Feng Zhou
2/4/2004
Strong points of the paper are:
- This is a "domain-specific" file system designed to exploit a lot
of charasteristics of its applications. Some techniques are very
interesting and performance/availability results are very encouraging.
- Several important factors of the usage pattern is that bandwidth
matters more than latency and files are large. This means all
those designs for minimizing control messages and cache-coherence
messages can go away, simply because data traffic dominates the network
because of large file size and applications don't care about several
round trips of messages in opening a file. This simplifies the
design a lot and enables the designers to lean the design more to
availability and manageability.
- The "record append" semantics is good for a lot of
applications. This is both useful and can be implemented
efficiently. Therefore, it's better than enforcing UNIX file
semantics over the network, which has been shown to be at least slow,
if not possible.
- Leases and primary replica is a good mechanism for implement
ordering of updates with good performance. This is better than
letting the master do all the ordering.
One major flaw.
The paper is rather case study-like. Although useful, most
techniques are not general enough. And few design principles
useful to others are discussed.