Research

My research focus is, broadly speaking, on integrating data from multiple sources into a single dataspace. A dataspace is a next-generation data integration platform built on the principles of data-coexistence and pay-as-you-go integration. That is, in emerging areas such as consumer-facing data management (i.e., the web) and large-scale sensor-based applications, one cannot hope to tightly integrate all data sources. Thus, dataspaces propose an incremental approach to integration. Data encompassed by a dataspace is immediately accessible through simple means such as keyword search. Over time, through the combination of semi-automated techniques with judicious use of human effort, a dataspace gradually incorporates more and more understanding of its underlying sources.

Prior to this vein of research, I explored various aspects of "bridging the physical-digital divide": integrating data captured by physical sensor devices (e.g., RFID technology and wireless sensor networks) into traditional data processing infrastructures. This work was done as part of UC Berkeley's HiFi project, a distributed hierarchical system for managing and processing sensor-based data streams.

In the past, I have worked with the PIER group at UC Berkeley as well as with the database research group at the University of Wisconsin-Madison researching P2P query processing

My advisor is Mike Franklin.



Publications:

Dissertation

Dataspaces Sensor-related HiFi Data Furnace Peer-to-peer

Talks: