Tools for Session Analysis

[    Intro    |    Code    |    Data    |    Apps    |    Warnings    |    Docs    |    People    |    Links    ]

DFA representation of the structure of HTTP
sessions on port 8080
DFA representation of the structure of HTTP sessions on port 8080

Intro

Sessions are groups of TCP connections that are related to one another. Some types of session structure, such as the coupling between an FTP control connection and the data connections it spawns, have prespecified forms, though the specifications do not guarantee how the forms appear in practice. Other types of sessions, such as a user reading email with a browser, only manifest empirically. Still other sessions might exist without us even knowing of their presence, such as a botnet zombie receiving instructions from its master and proceeding in turn to carry them out.

The code below can discover these kinds of sessions by mining a TCP connection trace, and it then classifies them into normal sessions (the first two kinds) and abnormal (sometimes the last kind). For the normal sessions, it can further semi-automatically generate succinct descriptors for the session structure in the form of DFAs. These DFAs are formatted so a human analyst can understand them easily.


Code

Download code . The code is factored out into three offline tools (ses_extract, ses_abstract, ses_abnormal) and one online intrusion detection tool (ses_abnormal_bro). They are all wrapped up in the tgz above. To compile these tools, unzip the archive, and do "make all". This has been tested on Linux (FC3) and FreeBSD. License: BSD.


ses_extract: Takes as input a Bro TCP connection log, and produces as output a list of all sessions detected in the log. See "README" in the distribution for how to use this.


ses_abstract: Takes as input a list of sessions (produced by ses_extract). It classifies them by application. Then, for each application, it generates a set of DFAs (deterministic finite state automatons; essentially regular expressions) that capture the structure of these sessions at various levels. It also generates a coverage curve to help decide which DFA is most suitable for representing the session structure of a application. This is available in the directory "ses_abstract/". Instructions are available in "ses_abstract/README". This tool requires the FSA library and dot (from graphviz) to be installed (links here ).


ses_abnormal: Takes as input a list of sessions (produced by ses_extract). It then reports abnormal sessions (based on frequency of occurrence). This is available in the directory "ses_abnormal/" of the archive. Instructions for using it are in "ses_abnormal/README".


ses_abnormal_bro: This is a plugin for the Bro IDS that finds abnormal sessions by analyzing network traffic online . This re-implements ses_extract and ses_abnormal in the Bro language; it can be used for offline analysis too. This is available in the directory "ses_abnormal_bro/" of the archive. Instructions for using it are in "ses_abnormal_bro/README". This requires the Bro software (any version since 2005 will do).

Data

The data we used for our paper is private. However, some public data is available on which you can try the code. LBL connection logs from the ITA: LBL-CONN-7 .

Apps

These tools are of use mostly to researchers and network administrators.

ses_extract can be used by researchers to play hands-on with the sessions in their trace. It is a useful preprocessing tool to process flow logs in terms of sessions rather than connections. This make more sense in some cases (refer the IMC paper for details). ses_extract can also be used by admins for higher-level grokking of large connection log files.

ses_abstract can be used to infer the structure of protocols used in the trace and to identify protocols based on their structure.

ses_abnormal can be used for identifying misconfigurations, open proxies, successful attacks, peer-to-peer applications etc. The Tech Rep discusses this in detail.

ses_abnormal_bro is intended to run as an online IDS plugin with Bro, and can also be used offline if desired.


Warnings

There are about 5 parameters in the code that may need tweaking for other datasets other than the 2 main sets of traces we worked with. These parameters and the values that worked for us are tabulated in the IMC paper. All parameters are #defined in the code; you can change them. All of them can also be set via command-online options.

There are a few heuristics we tried out in the code; these can be controlled by the appropriate #defines in the code. The distribution, by default, has the heuristics that worked for us.

The C++ code (ses_extract) was re-written 3 times from scratch over 4 years. The scripts, on the other hand, are in a variety of languages (perl, bash, sed, awk) and are not quite as mature; they have been made portable and somewhat readable, but if we were to do this again, we would just do it in a single language, probably Perl.


Docs

The papers that describe how these tools work are available here . The IMC 2006 paper describes ses_extract and ses_abstract. The 2005 Tech Rep describes ses_abnormal and ses_abnormal_bro.

People

Jayanth Kannan, Jaeyeon Jung, Vern Paxson, Can Emre Koksal. (please contact Jayanth for any questions).
The Bro IDS
eXpose has the same goal as ses_abstract: infer communication rules for applications from flow logs. Compared to ses_abstract, exPose uses templates to capture more complex session structure and is fully automatic. However, they may not be able to identify individual sessions (like ses_extract, ses_abnormal). ses_abnormal_bro provides an online implementation; eXpose does not. Also, we think that the visualization provided by ses_abstract is easier to grok.
Graphviz (we use this for drawing the DFAs nicely)
Finite State Automata Utilities (we use this for performing various operations on DFAs such as intersection, union, etc).
Webpage format is inspired by the vx32 page