Tools for Session Analysis
[
Intro |
Code |
Data |
Apps |
Warnings |
Docs |
People |
Links
]
DFA representation of the structure of HTTP sessions on port 8080
Intro
Sessions are groups of TCP connections that are related to one
another. Some types of session structure, such as the coupling
between an FTP control connection and the data connections it spawns,
have prespecified forms, though the specifications do not guarantee
how the forms appear in practice. Other types of sessions, such as a
user reading email with a browser, only manifest empirically. Still
other sessions might exist without us even knowing of their presence,
such as a botnet zombie receiving instructions from its master and
proceeding in turn to carry them out.
The code below can discover these kinds of sessions by mining a TCP
connection trace, and it then classifies them into normal sessions
(the first two kinds) and abnormal (sometimes the last kind). For the
normal sessions, it can further semi-automatically generate succinct
descriptors for the session structure in the form of DFAs. These DFAs
are formatted so a human analyst can understand them easily.
Code
Download code .
The code is factored out into three offline tools (ses_extract,
ses_abstract, ses_abnormal) and one online intrusion detection tool
(ses_abnormal_bro). They are all wrapped up in the tgz above. To
compile these tools, unzip the archive, and do "make all". This has
been tested on Linux (FC3) and FreeBSD. License: BSD.
ses_extract: Takes as input a Bro TCP connection log, and produces as
output a list of all sessions detected in the log. See "README" in the
distribution for how to use this.
ses_abstract: Takes as input a list of sessions (produced by
ses_extract). It classifies them by application. Then, for each
application, it generates a set of DFAs (deterministic finite state
automatons; essentially regular expressions) that capture the
structure of these sessions at various levels. It also generates a
coverage curve to help decide which DFA is most suitable for
representing the session structure of a application. This is available
in the directory "ses_abstract/". Instructions are available in
"ses_abstract/README". This tool requires the FSA library and dot
(from graphviz) to be installed (links here ).
ses_abnormal: Takes as input a list of sessions (produced by
ses_extract). It then reports abnormal sessions (based on frequency of
occurrence). This is available in the directory "ses_abnormal/" of the
archive. Instructions for using it are in
"ses_abnormal/README".
ses_abnormal_bro: This is a plugin for the Bro IDS that finds abnormal
sessions by analyzing network traffic online . This re-implements
ses_extract and ses_abnormal in the Bro language; it can be used for
offline analysis too. This is available in the directory "ses_abnormal_bro/"
of the archive. Instructions for using it are in
"ses_abnormal_bro/README". This requires the
Bro software (any version since 2005 will do).
Data
The data we used for our paper is private. However, some public data
is available on which you can try the code. LBL connection logs from
the ITA:
LBL-CONN-7
.
Apps
These tools are of use mostly to researchers and network
administrators.
ses_extract can be used by researchers to play hands-on with the
sessions in their trace. It is a useful preprocessing tool to process
flow logs in terms of sessions rather than connections. This make more
sense in some cases (refer the IMC paper for details). ses_extract can
also be used by admins for higher-level grokking of large connection
log files.
ses_abstract can be used to infer the structure of protocols used in
the trace and to identify protocols based on their structure.
ses_abnormal can be used for identifying misconfigurations, open
proxies, successful attacks, peer-to-peer applications etc. The Tech
Rep discusses this in detail.
ses_abnormal_bro is intended to run as an online IDS plugin with Bro,
and can also be used offline if desired.
Warnings
There are about 5 parameters in the code that may need tweaking for
other datasets other than the 2 main sets of traces we worked
with. These parameters and the values that worked for us are tabulated
in the IMC paper. All parameters are #defined in the code; you can
change them. All of them can also be set via command-online
options.
There are a few heuristics we tried out in the code; these can be
controlled by the appropriate #defines in the code. The distribution,
by default, has the heuristics that worked for us.
The C++ code (ses_extract) was re-written 3 times from scratch over 4
years. The scripts, on the other hand, are in a variety of languages
(perl, bash, sed, awk) and are not quite as mature; they have been
made portable and somewhat readable, but if we were to do this again,
we would just do it in a single language, probably Perl.
Docs
The papers that describe how these tools work are available here . The IMC 2006 paper describes
ses_extract and ses_abstract. The 2005 Tech Rep describes ses_abnormal
and ses_abnormal_bro.
People
Jayanth Kannan, Jaeyeon Jung, Vern Paxson, Can Emre
Koksal. (please contact Jayanth for any
questions).
Links
The Bro IDS
eXpose
has the same goal as ses_abstract: infer communication rules for
applications from flow logs. Compared to ses_abstract, exPose uses
templates to capture more complex session structure and is fully
automatic. However, they may not be able to identify individual
sessions (like ses_extract, ses_abnormal). ses_abnormal_bro provides
an online implementation; eXpose does not. Also, we think that the
visualization provided by ses_abstract is easier to grok.
Graphviz (we use this for
drawing the DFAs nicely)
Finite State Automata
Utilities (we use this for performing various operations on DFAs
such as intersection, union, etc).
Webpage format is inspired by
the vx32 page