Projects
Main Page
- Consistency of DHT Routing: A
theoretical analysis of the correctness of DHT-based routing.
- Discoverer: A tool for mining packet format specs from a tcpdump
trace.
- ROFL: A Internet routing algorithm that
works with flat labels.
- Session Analysis: A tool for mining
sessions (groups of related TCP connections) and their structure for
typical applications from a network trace.
- SmartSeer: A distributed version of
CiteSeer that allows continuous queries over documents stored on a
Planetlab-like system.
- Cooperative worm containment: An
analytical study of how well information propagation among firewalls
works against worm attacks.
- OCALA: A end-user proxy that allows unmodified
applications access to new network architectures.
- Hush: A scalable un-structured overlay for
anonymous web browsing.
- Consistency of DHT Routing: A theoretical analysis of the
correctness of DHT-based routing.
(with Matthew Caesar, Ion
Stoica, and Scott Shenker)
Abstract: DHT-based routing has been proposed recently in the
literature and this new routing paradigm promises several advantages
(such as, decreased state requirement, maintenance overhead) over
conventional routing protocols. However, the robustness of such
schemes is questionable given the difficulty of maintaining even
overlay DHTs correctly on Planetlab.
In this work, we address this issue head-on by taking the first steps
towards a theory of the consistency of DHT-based routing. We do so by
proposing the first fully-decentralized maintenance algorithm for
DHT-based routing and then prove that it guarantees the property of
eventual consistency. We also present simulation results that verify
the correctness of our maintenance algorithm converges correctly under
stress conditions. Although it is presently unclear whether DHT-based
routing will ever be deployed in the Internet, we address one of the
main technical objections against them: their perceived lack of
robustness.
Tech Rep.
- Discoverer: A tool for mining packet format specs from a tcpdump
trace.
(with Weidong Cui and Helen Wang)
Abstract: Application-level protocol specifications are useful for
many security applications, including intrusion prevention and
detection that performs deep packet inspection and traffic
normalization, and penetration testing that generates network inputs
to an application to uncover potential vulnerabilities. However,
current practice in deriving protocol specifications is mostly
manual. In this paper, we present Discoverer, a tool for automatically
reverse engineering the protocol message formats of an application
from its network trace. A key property of Discoverer is that it
operates in a protocol-independent fashion by inferring protocol
idioms commonly seen in message formats of many application-level
protocols. We evaluated the efficacy of Discoverer over one text
protocol (HTTP) and two binary protocols (RPC and CIFS/SMB) by
comparing our inferred formats with true formats obtained from
Ethereal. For all three protocols, more than 90% of our inferred
formats correspond to exactly one true format; one true format is
reflected in five inferred formats on average; our inferred formats
cover over 95% of messages, which belong to 30-40% of true formats
observed in the trace.
Paper. Talk (Weidong Cui;
Usenix Security 2007).
- ROFL: A Internet routing algorithm that works with flat labels.
(with Matthew Caesar, Tyson Condie, Karthik Lakshminarayanan, Ion
Stoica, and Scott Shenker)
Abstract: It is accepted wisdom that the current Internet architecture
connotes network locations and host identities, but there is no
agreement on how a future architecture should distinguish the two. One
could sidestep this quandary by routing directly on host identities
themselves, and eliminating the need for network-layer protocols to
include any mention of network location. The key to achieving this is
the ability to route on at labels. In this paper we take an initial
stab at this challenge, proposing and analyzing our ROFL routing
algorithm. While its scaling and efficiency properties are far from
ideal, our results suggest that the idea of routing on labels cannot
be immediately dismissed.
Paper. Talk (Matthew Caesar; SIGCOMM 2006).
- Session Analysis: A tool for mining sessions (groups of related
TCP connections) and their structure for typical applications from a
network trace.
(with Jaeyeon Jung, Vern Paxson, and Can Emre Koksal)
Abstract: While the problem of analyzing network traffic at the
granularity of individual connections has seen considerable previous
work and tool development, understanding traffic at a higher level the
structure of user-initiated sessions comprised of groups of related
connections remains much less explored. Some types of session
structure, such as the coupling between an FTP control connection and
the data connections it spawns, have prespecified forms, though the
specifications do not guarantee how the forms appear in
practice. Other types of sessions, such as a user reading email with a
browser, only manifest empirically. Still other sessions might exist
without us even knowing of their presence, such as a botnet zombie
receiving instructions from its master and proceeding in turn to carry
them out.
We present algorithms rooted in the statistics of Poisson processes
that can mine a large corpus of network connection logs to extract the
apparent structure of application sessions embedded in the
connections. Our methods are semi-automated in that we aim to present
an analyst with high-quality information (expressed as regular
expressions) reflecting different possible abstractions of an
application's session structure. We develop and test our methods using
traces from a large Internet site, finding diversity in the number of
applications that manifest, their different session structures, and
the presence of abnormal behavior. Our work has applications to
traffic characterization and monitoring, source models for
synthesizing network traffic, and anomaly detection.
Paper.
Tech Rep. Code. Talk (IMC 2006).
- SmartSeer: A distributed version of CiteSeer that allows
personalized continuous queries.
(with Beverly Yang, Scott
Shenker, Puneet Sharma, Sujata Banerjee, Sujoy Basu, and Sung Ju Lee)
Abstract: As the academic world moves away from physical journals and
proceedings towards online document repositories, the ability to
efficiently locate work of interest among the torrent of
newly-generated papers will become increasingly important. To aid in
this endeavor, we designed SmartSeer, a system that allows users to
register personalized continuous queries over the CiteSeer database of
technical documents. Users are then alerted whenever papers that match
their queries are put online.
SmartSeer has two main design requirements. First, to allow effective
information retrieval, it should support rich continuous queries (as
opposed to simple keyword searches). Second, to make effective use of
donated infrastructure, it should be capable of running on a loosely
maintained group of unreliable machines spread across multiple
organizations (as opposed to assuming a reliable and tightly coupled
distributed system). Existing work on distributed continuous query
systems fails at least one of these requirements. Our design for
SmartSeer is based on Distributed Hash Tables (DHTs), and thereby
leverages previous work on DHT-based query systems. A prototype of
SmartSeer has been implemented and evaluated on Planetlab. Though we
evaluate our design only for the SmartSeer application, we believe it
also provides useful insights into other distributed and rich
continuous query systems (web alerts, news alerts etc).
Paper. Code available on
request. Talk (INFOCOM 2006).
- Cooperative worm containment: An analytical study of how well
information propagation among firewalls works against worm
attacks.
(with Lakshminarayanan Subramanian, Ion Stoica, and
Randy H Katz)
Abstract: Fast scanning worms, that can infect nearly the entire
vulnerable population in order of minutes, are among the most serious
threats to the Internet today. In this work, we investigate the
efficacy of cooperation among Internet firewalls in containing such
worms. We first propose a model for firewall-level cooperation and
then study the containment in our model of cooperation using analysis
and simulation. Our results suggest that, with moderate overhead,
cooperation among Internet firewalls can provide 95% containment under
10% deployment while being resilient to 100-1000 malicious
firewalls.
Paper. Talk (SRUTI 2005).
- OCALA: A end-user proxy that allows unmodified applications
access to new network architectures.
(with Dilip Joseph, Ayumu
Kubota, Karthik Lakshminarayanan, Ion Stoica, Klaus Wehrle)
Abstract: In order for overlays and new network architectures to gain
real user acceptance, users should be able to leverage overlay
functionality without any modifications to their applications and
operating systems. We present our design, implementation, and
experience with OCALA, an overlay convergence architecture that
achieves this goal. OCALA interposes an overlay convergence layer
below the transport layer. This layer is composed of an overlay
independent sub-layer that interfaces with legacy applications, and an
overlay dependent sub-layer that delivers packets to the
overlay. Unlike previous efforts, OCALA enables: (a) simultaneous
access to multiple overlays (b) communication between hosts in
different overlays (c) communication between overlay hosts and legacy
hosts (d) extensibility, allowing researchers to incorporate their
overlays into OCALA. We currently support five overlays, i3 ,RON , HIP
, DOA and OverDoSe on Linux, Windows XP/2000 and Mac OS X. We (and a
few other research groups and end-users) have used OCALA for over a
year with many legacy applications ranging from web browsers to remote
desktop applications.
Paper. Code. Talk. (Dilip Joseph; NSDI 2006)
- Hush: A scalable un-structured overlay for anonymous web
browsing.
Abstract: Anonymous communication over an open network such as the
Internet has long been recognized as a desirable feature, and there
have been several proposals for building overlay networks for this
purpose (Chaumian Networks, Onion Routing, Tarzan, Crowds, DC-Nets,
Herbivore). These networks are however not feasible in practice for
one or more of the following reasons: poor scalability, low
efficiency, and no long-term anonymity. This work describes Hush, an
anonymization overlay, that addresses these issues using probabilistic
forwarding schemes based on hash functions. We analyze the security
properties of Hush to derive its anonymity guarantees against various
classes of adversaries and compare it with existing anonymization
overlays. We then further extend our analysis techniques to comment
qualitatively and quantitatively on the capability and limitations of
different classes of anonymization overlays. The scalability and
efficiency of Hush make it a practical proposition for applications
like anonymous web browsing and anonymizing peer-to-peer
networks.
Tech Rep.
Code. Related Talk
(Group Meeting).