CS262B Reading Summary
Measurement, Modeling, and Analysis of a
Peer-to-Peer File-Sharing Workload
Krishna Gummadi et al.
Summary by Feng Zhou
3/15/04
Strong points of the paper are:
- Through a large scale (60000 user community), long period (200
days) trace, this paper presented unprecedented detailed data about the
Kazaa workload. Several insightful observations are made about the
workload. For example, users are more patient than Web users.
More importantly, the "fetch-at-most-once" property is claimed to be
the primary feature of the Kazaa object dynamics. This property, which
follows directly from the immutability of multimedia objects, has
important implications for the overal behavior of the workload.
One of the implication is that it makes the access patttern
non-Zipf. Another way to think about the "fetch-at-most-once"
property is that every client has a local cache (the local file system)
of all objects ever fetched. Because objects never change,
accessing visited object is always a hit.
- A synthesized model of P2P file sharing workload is built, using
underlying Zipf accesses to objects masked by the "fetch-at-most-once"
property. Simulated object popularity agrees with the trace, which
validates the model for the trace. The effectiveness of a shared proxy
cache is simulated on this model. The result suggests that an ideal
proxy cache can have a very good hit rate (60%-90%). But the
required cache size to reach good hit rate is large.
- The last major result is that locality-aware P2P systems can save
large amount of external bandwidth. Although the effectiveness is
certainly related the number of active users inside the organization,
the trace shows that for a University-sized organization, more than 86%
of external bandwidth can be save by using an ideal locality-aware P2P
system. One contributing factor to this high effectiveness is that
popular objects have good availability because more users have them.
One major flaw.
The claim in seciton 4.4 that new clients cannot stabilize
performance is rather counter-intuitive. Presumably under normal
operations, the popularity distribution of all objects should be
stable and object popularity changes over time gradually. Then a
centralized proxy cache will have a stable hit rate, rather than an
ever-decreasing one.