DiSC: A Distributed Searchable Cache

Home
N-SMARTS
Schedule
Papers
Bios
Links

DiSC is a project inspired by a trip to Ghana which I made in 2001. During this trip, I observed that while many Internet cafes in Ghana have modern computers with fast processors, ample memory, large hard drives, and fast LAN connections, their WAN connections were slow and intermittent. Furthermore, the university library which I visited was unable to afford internet connectivity, although it had computers available for the task.

I therefore conceived of a web-cache which is distributed over the entire local network. This allows the cache to take advantage of all of the excess storage in the network, and avoids the need for a centralized proxy server, thus reducing hardware costs (and completely eliminateing additional capital outlay for existing networks). By making the cache searchable, we allow for completely disconnected operation, and avoid latency for search queries.

I envision a system in a library which only caches library-like documents: technical-papers, news articles, online books, online journals and other relativly static, mostly textual documents. I believe that even with a modest network, a sizable percentage of the documents like these which exist on the internet could be cached and searchable.

In a project for the class IT for Developing Regions at UC Berkeley, I and my partners found some evidence that there is significant correlation between queries and data in the cache. We suspect that this locality will actually result in more relevant results, since data has been "hand picked" by other users who use the cache.

I intend to study this relationship further, including techniques for exploiting locality in queries to improve cache performance and visa-versa, as well as other search ranking techniques which take advantage of the cache's ability to monitor all web activity.

DiSC Links

  • Our journal of our trip.
  • The SourceForge project has the preliminary code which we developed for our experiments.
  • A Wiki for our project
  • The paper which describes the results of our experiments
  • An article on distiributed opower generation, monitoring, etc. In the context of US,m but some ideas might apply helping to developing countries.
  • An Africa Source: African Free and Open Source Software Developers Meeting
  • An Wizzy Digital Computer A project a little like TEK
  • The TIER Project At UC Berkeley

Department of Computer Science
205 Cory Hall #1772
University of California
Berkeley, CA
94720-1776
My office is at:
545S Cory Hall
Last Modified: Tuesday, 05-Oct-2004 12:55:02 PDT