My research focus is, broadly speaking, on integrating data from
multiple sources into a single dataspace. A dataspace is a next-generation data
integration platform built on the principles of data-coexistence and
pay-as-you-go integration. That is, in emerging areas such as
consumer-facing data management (i.e., the web) and large-scale
sensor-based applications, one cannot hope to tightly integrate all
data sources. Thus, dataspaces propose an incremental approach to
integration. Data encompassed by a dataspace is immediately
accessible through simple means such as keyword search. Over time,
through the combination of semi-automated techniques with judicious
use of human effort, a dataspace gradually incorporates more and more
understanding of its underlying sources.
Prior to this vein of research, I explored various aspects of
"bridging the physical-digital divide": integrating data captured by
physical sensor devices (e.g., RFID technology and wireless sensor
networks) into traditional data processing infrastructures. This work
was done as part of UC Berkeley's HiFi project, a distributed
hierarchical system for managing and processing sensor-based data
streams.
Shariq Rizvi, Shawn R. Jeffery, Sailesh Krishnamurthy, Michael J. Franklin, Nathan Burkhart, Anil Edakkunni, Linus Liang: Events on the Edge. SIGMOD 2005 (Demonstration)
Minos Garofalakis. Kurt P. Brown, Michael J. Franklin, Joseph M. Hellerstein, Daisy Zhe Wang, Eirinaios Michelakis, Liviu Tancau, Eugene Wu, Shawn R. Jeffery, and Ryan Aipperspach. Probabilistic Data Management for Pervasive Computing: The Data Furnace Project. IEEE Data Engineering Bulletin, Vol. 29, No. 1, March 2006 (Special Issue on Probabilistic Data Management), pp. 57-63.
Brent Chun, Joseph M. Hellerstein, Ryan Huebsch, Shawn R. Jeffery, Boon Thau Loo, Sam Mardanbeigi, Timothy Roscoe, Sean C. Rhea, Scott Shenker, Ion Stoica: Querying at Internet-Scale. SIGMOD 2004 (Demonstration)