CS262B Reading Summary

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Krishna Gummadi et al.

Summary by Feng Zhou
3/15/04

Strong points of the paper are:

  1. Through a large scale (60000 user community), long period (200 days) trace, this paper presented unprecedented detailed data about the Kazaa workload. Several insightful observations are made about the workload.  For example, users are more patient than Web users. More importantly, the "fetch-at-most-once" property is claimed to be the primary feature of the Kazaa object dynamics. This property, which follows directly from the immutability of multimedia objects, has important implications for the overal behavior of the workload.  One of the implication is that it makes the access patttern non-Zipf.  Another way to think about the "fetch-at-most-once" property is that every client has a local cache (the local file system) of all objects ever fetched.  Because objects never change, accessing visited object is always a hit.
  2. A synthesized model of P2P file sharing workload is built, using underlying Zipf accesses to objects masked by the "fetch-at-most-once" property. Simulated object popularity agrees with the trace, which validates the model for the trace. The effectiveness of a shared proxy cache is simulated on this model. The result suggests that an ideal proxy cache can have a very good hit rate (60%-90%).  But the required cache size to reach good hit rate is large.
  3. The last major result is that locality-aware P2P systems can save large amount of external bandwidth. Although the effectiveness is certainly related the number of active users inside the organization, the trace shows that for a University-sized organization, more than 86% of external bandwidth can be save by using an ideal locality-aware P2P system. One contributing factor to this high effectiveness is that popular objects have good availability because more users have them.
One major flaw.

The claim in seciton 4.4 that new clients cannot stabilize performance is rather counter-intuitive. Presumably under normal operations, the popularity  distribution of all objects should be stable and object popularity changes over time gradually.  Then a centralized proxy cache will have a stable hit rate, rather than an ever-decreasing one.