The Berkeley algorithms reading group, Summer 2003


Time: Thursday 1pm
Place: 373 Soda
There are usually one or more papers that you should at least look over before coming, so that we can have a more effective discussion.

Past semesters' reading group topics: Fall 2001; Spring 2002; Summer 2002; Fall 2002; Spring 2003

To subscribe or unsubscribe, click here.

This semester's first topic: "Peer-to-Peer Networks" (some papers)

Date Speaker Abstract & paper reference
May 23 Kris Hildrum I'm going to focus on locating an object in a peer-to-peer network. I'll start by explaining that sentence, including what I mean by "peer-to-peer network", and then I'll talk about some results.

First, I will talk about consistent hashing, which was introduced by Karger et. al in 1997. Traditional hashing schemes distribute items to buckets in a way that depends heavily on the number of buckets. Adding or removing a bucket means recomputing the correct bucket for all the items and moving many of them. With consistent hashing, adding a bucket means only items movings to the new bucket move, and all other items remain where they are. Likewise, deleting a bucket means only moving items out of the deleted bucket. I will then discuss one way (Chord) in which this sort of scheme can be used to find objects in peer-to-peer networks. (Indeed, consistent hashing so fundamental to peer-to-peer networks that people tend to take it for granted.)

Second, I will talk about object location when there are many copies of an object, and we would like to find a copy close to the search starting point without traveling far in the process. I will present the algorithm and data structure Awerbuch and Peleg (here) use to solve this problem.

Finally, if there's time, I will combine these and mix in some metric embeddings to show how HST (hierarchically well-separated trees) can be used to locate close copies of objects. (here).

Don't feel like you have to look at any of the papers I linked to; I'll assume you haven't.

May 29th
1pm
Kris Hildrum (again) Continued from last time. See the last three paragraphs of the last abstract.

June 5th
2pm
Lawrence Ip Edith Cohen and Scott Shenker, "Replication Strategies in Unstructured Peer-to-Peer Networks", SIGCOMM 02

If we have an unstructured network P2P then we can model search probes as a probe to a random node. Then the expected number of queries needed to find an object only depends on the number of copies of that object. The optimal number of objects obviously depends on the query distribution over the objects.

Cohen and Shenker analyze the expected number of queries for two different schemes, uniform (where each object has the same number of copies), and proportional (where the number of copies of each object is proportional to the frequency of queries for that object), and show that surprisingly, neither is optimal. The optimal number of copies turns out to be proportional to the square root of the frequency of queries.

I'll show where the square root comes from and mention a few schemes they came up with to make the right number of copies.

June 19th
1pm
Kamalika Finding Nearest Neighbors in Growth-Restricted Metrics, David Karger and Matthias Ruhl, STOC 2002 I will talk about a dynamic data structure to do efficient nearest neigbor finding and range queries in "growth restricted" metrics. In a growth restricted metric, the number of points in a ball of size 2r around any point, say p, is at most a constant times more than the number of points in a ball of size r around p. Such spaces have some nice properties which are exploited in the performance analysis of the data structure.
June 26th
1pm
Ben I'll go over the paper "Fault-tolerant routing in peer-to-peer systems" by Aspnes, Diamadi & Shah (2002). A web page with short and full versions: http://cs-www.cs.yale.edu/homes/aspnes/greedy-routing-abstract.html .

They give lower bounds for greedy routing -- i.e., at each step, you take the link getting you as close as possible to your destination. When the expected number of outgoing links l is less than log n, their bounds are better than the trivial bound for s-step connectivity l^s >= n.

Their proof is fairly intuitive, averaging over bad cases. However, it only applies in one dimension, and one bound is probably not tight -- so there's room for improvement.

July 3rd
1pm
Canceled.
July 10th
1pm
Brighten We present Naps, a randomized, distributed algorithm for determining which nodes of an ad hoc network should ``nap'' such that the set of waking nodes is well-distributed and of a desired, lower density. The algorithm exploits the structure of random geometric graphs, and does not rely on geographical information. Furthermore, it is robust under a large range of environmental assumptions. This algorithm may enable a practical deployment scheme, in which nodes are deployed with a density far greater than is necessary for connectivity or sensing coverage in order to extend the lifetime of the system. It may also be used to select a backbone of nodes to manage routing in a dense network to reduce signal interference and contention.

We give a percolation theoretic analysis of the connectivity properties of the graphs obtained after running Naps. Specifically, our algorithm takes a single parameter, the "neighbor threshold". We show that there exist \lambda and c such that for any node density at least \lambda and any neighbor threshold at least c, there is a non-zero probability that a particular waking node is connected to an infinite-sized component when the process is run on an infinite area.

We also examine empirical simulation results for a wide range of initial deployment densities, heterogenous densities, and a mobile setting.

This is joint work with Alex Fabrikant and David Ratajczak.

July 17th
1pm
Robi I will talk about a recent paper of Abraham, Malkhi, and Dobzinski, titled "LAND: Locality Aware Networks for Distributed Hash Tables." (pdf)

This paper proposes a peer-to-peer network scheme that supports objects lookup with constant distortion (i.e., stretch) of distances, with constant expected number of links per node. In some sense, this is a version of the PRR scheme with "dual" performance guarantees and simpler proofs. Their scheme actually distinguishes between regular and ultra nodes, but I will ignore this issue in the talk.

July 24th
1pm
Andrej
July 10th
1pm
Eran