Analyzing Web Traffic
ECML/PKDD 2007 Discovery Challenge
September 17-21, 2007, Warsaw, Poland

Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier EMA LGI2P
Bee-Ware Univ. Montpellier 2

Introduction - Main objectives - Dataset - Evaluation - Cooperation

The number of computer attacks carried out grows in tandem with the web. According to the National Institute of Standards and Technology, American companies as early as 2004 suffered losses of up to 59.6 billion dollars following IT attacks. Considering the number of IT systems now deployed, intrusion detection is a significant research area for the purpose of assessing and forecasting system attacks as early as possible.

The OSI model is usually represented by a diagram showing a column composed of stacked rectangular shapes, each one symbolizing a layer of the model. However, in reality the seventh layer is much wider and more diverse than the layers below it. This application layer is definitely the biggest, widest, and most complex of all. It contains more than just protocols and parameters, and is made up of languages, scripts, libraries and human concepts... As a consequence, the OSI model observed from a security perspective makes the diagram take on a reversed pyramid shape. So the higher the layers, the richer and more diverse is their content, which means they are also more complex to secure.


Trying to filter application traffic as diverse and dynamic as Web traffic can quickly bring awareness of the existence of several strong constraints and the necessity to fulfil specific requirements.

Unknown attack detection
(or Zero day attacks) A major consequence of application diversity is that the potential for vulnerabilities is infinite. Experience has already revealed that the vast majority of application attacks consist in the unknown variety.
False Positives
Considering the richness and diversity of this domain, and seeing that the threshold of acceptance is user-dependent, avoiding and eliminating False Positives are critical issues when analyzing the application layer
Ambiguous queries
When looking at existing applications it becomes quickly obvious that they harbour weaknesses or vulnerabilities. Traffic addressing these resources will then appear to carry weaknesses, but cannot be blocked without stopping the application.
Abnormal behaviour detection
Attacks are not the only danger prevalent. Securing Web traffic is a more complex task than mere intrusion prevention. There are various other types of requests that require supervision.
[Table of Contents]

Introduction - Main objectives - Dataset - Evaluation - Cooperation

The issue being addressed by this challenge is the filtering of application attacks in Web traffic. This is a complex matter because of diversity in attack purposes and means, the quantity of data involved and technological shifts. Application attacks can be multi-class and undergo constant change. They do however maintain some distinguishing features (escaping, bypassing, keywords matching external entities, etc.).

To achieve this aim data sources available from HTTP query logs will be used. Using this data we can not only recognize an attack but also define which class it belongs to. Participants would have to start with an HTTP query in context and deduce which class it belongs to and what is its level of relevance.

To address this issue in the most efficient way, we will divide the challenge into several tasks:

  1. Task 1: Multi-class and contextual classification
    We have to be able to classify queries that may belong to different classes, and we have to do so according to context. A query in attack form that is not dangerous because made in the wrong context has to be properly labelled. The amount of data to process being considerable due to traffic density, any real-world classification application should be able to process the queries extremely quickly. Participants will hence be judged on the classification performance but also on the time performance of their algorithm implementations.

  2. Task 2: Isolation of the attack pattern
    We should be able to pinpoint in an attack query the shortest chain that conveys the attack.

[Table of Contents]

Introduction - Main objectives - Dataset - Evaluation - Cooperation

Dataset composition

The dataset will be composed of 50000 samples including 20% of attacks. 10% of these attacks will be out of context. These samples look like real attacks but have no chance to succeed because they are constructed blindly and do not target the correct entities. One sample can eventually target several classes (SQL injection, Command execution etc. ) Each example is totally independent of the others.

Dataset Format

The data set will be defined in XML (portable and standard format). Each sample will be identified by a unique id, and will contain three major parts: Context (describes the environment in which the query is run), Class (describes how an expert will classify this sample) and the description of the query itself.

Context: It contains the following attributes:

  1. Operating system running on the Web Server ( UNIX, WINDOWS, UNKNOWN ).
  2. HTTP Server targeted by the request ( APACHE, MIIS, UNKNOWN ).
  3. Is the XPATH technology understood by the server? ( TRUE, FALSE, UNKNOWN )
  4. Is there an LDAP database on the Web Server? ( TRUE, FALSE, UNKNOWN )
  5. Is there an SQL database on the Web Server? ( TRUE, FALSE, UNKNOWN )

Classes: lists the different subdivision levels of HTTP query categorization ( and how they are represented in the context part of the dataset ).

The "type" element indicate which class this request belongs to :

  1. Normal query ( Valid )
  2. Cross-Site Scripting ( XSS )
  3. SQL Injection ( SqlInjection )
  4. LDAP Injection ( LdapInjection )
  5. XPATH Injection ( XPathInjection )
  6. Path traversal ( PathTransversal )
  7. Command execution ( OsCommanding )
  8. SSI attacks ( SSI )

Moreover, a flag will be added explaining whether a query is within the assigned context or not. ( element "inContext" taking two values : TRUE or FALSE )

Another element ( "attackIntervall" ) indicates where the attack is located on the query description. This element begins with the name of the element where the attack is located ( uri, query, body, header ) followed by ":". Thereafter the interval considered as an attack is specified. For headers, we also indicate the header name where the attack is located. The interval begins from the begginning of the considered header value.

Query: It will be described with its different components:
  1. method
  2. protocol
  3. uri
  4. query
  5. headers
  6. body
[Table of Contents]

Introduction - Main objectives - Dataset - Evaluation - Cooperation

Evaluation Criterion and Winner Selection

Each task will be evaluated as follows:

  1. Task 1:

    Precision and recall are the basic measures used in evaluating search strategies. For the "Analyzing Web Traffic" challenge, these criteria defined by these formulae will be used.

    $\displaystyle Precision = \frac{\mbox{number of relevant attacks
detected}}{\mbox{number of attacks detected}}$     (1)

    $\displaystyle Recall = \frac{\mbox{number of relevant attacks
detected}}{\mbox{number of relevant attacks}}$     (2)

    F-measure combines recall and precision in a single efficiency measure.

    $\displaystyle Fmeasure(\beta) = \frac{(\beta^{2}+1)\times Precision \times
Recall}{\beta^{2}\times Precision + Recall}$     (3)

    For the challenge, F-measure is calculated with $\beta$ = 1, meaning that the same weight is given to precision and recall.


    Winner of Task 1 : $\max(Fmeasure(1))$

    Speed evaluations will only be done for runs with $Fuzzy\_Fmeasure$ larger than the $Fuzzy\_Fmeasure$ average (see task 2). The run that accomplishes the fastest and best classification will be the winner of Task 1 bis.


    Winner of Task 1 bis: $\min(\mbox{time})$ if $Fuzzy\_Fmeasure(1,3) \ge avg(Fuzzy\_Fmeasure(1,3))$

  2. Task 2:

    For Task 2, the evaluation measure is based on a variant of $Fmeasure$ : $Fuzzy\_Fmeasure$. With $Fuzzy\_Fmeasure(\beta,n)$, a string given as an attack is correct if the relevant attack string is more or less similar to $n$ characters. In the challenge, $n=3$.

    For instance, let the string "ikjllldd" be given as an attack. If the relevant attack is "ikjlllddio", this result is correct to compute $Fuzzy\_Fmeasure(1,3)$.

    $\displaystyle Fuzzy\_Precision(n) = \frac{\mbox{number of relevant attack
detected string $\pm$\ n characters}}{\mbox{number of attacks
detected}}$     (4)

    $\displaystyle Fuzzy\_Recall(n) = \frac{\mbox{number of relevant attack detected
string $\pm$\ n characters}}{\mbox{number of relevant attacks}}$     (5)

    $\displaystyle Fuzzy\_Fmeasure(\beta,n) = \frac{(\beta^{2}+1)\times
...times Fuzzy\_Recall(n)}{\beta^{2}\times
Fuzzy\_Precision(n) + Fuzzy\_Recall(n)}$     (6)


    Winner of Task 2 : $\max(Fuzzy\_Fmeasure(1,3))$

Evaluation machine prerequisites

During the Challenge on HTTP attack detection, participants will have at their disposal an evaluation machine to house the contributions competing for the different tasks. There are a number of prerequisites that need to be fulfilled for the evaluation to be carried out properly. The machine would have to be either a PC or SUN with a SunOS (version 10 ou 9) or Linux ( Kernel 2.6.x).
operating system.

In order to simplify development as much as possible and allow participants to use the programming languages of their choice, different compilers and interpreters will be hosted by the machine :

  1. GCC 4.x
  2. Java EE 5 SDK (Java 1.5)
  3. J2EE 1.4 SDK (those who use 1.4 will not be affected)
  4. CPAN Perl Interpreter (stable release 5.8.8)
  5. Python Interpreter (stable release 2.5)

NOTE : Requests to install any other language must be made to the Challenge Team at least one month before the results of the different tasks are sent off.

On-line UNIX commands such as : autoconf, make, sed, awk, gawk, flex, bison and yacc will be available on the task evaluation machine.

To process datasets in XML format several code libraries exist for the different programming languages hosted by the machine :

  • Java : URLDecoder, present in various JDK
  • C/C++ : RudeCGI Library, version 4.0.1

  • Perl : CGI Parse module

    David Robins / HTTP-Parser-0.002 dbrobins/HTTP-Parser-0.02/

  • Python : Present in CGI packages with cgi.FieldStorage();

Response formats

Results should be stocked by participants programs in an export file in either XML or text-only format.

  1. XML : Participants will send back the class tag containing query number, attack class, context boolean and attack location if there is an attack.
  2. Text-only format with one response per line :
    <SampleId, ClassId, InContext, localisation> with location of character chain type with : [QUERY:min-max] or [HEADER:min-max]

Example : <1,3,1,QUERY:45-74> means that query 1 has been classed as SQL INJECTION (i.e. an attack), that the attack is situated in the right context and that it is situated between characters 45 and 74.

Challenge Committee

The winners will be judged on the basis of results obtained in different tasks. The significance of the algorithms will also be assessed. Value will be placed on non-straightforward approaches and the most innovative ideas will be selected by a group of experts for the "Creativity Award".

Introduction - Main objectives - Dataset - Evaluation - Cooperation

This Challenge will be organized in cooperation with Bee ware.
Founded in 2001, Bee Ware is the leading provider of Secure Web Enabled Delivery solutions. Built to Open Standards, Bee Ware's award-winning appliance based solutions ensure security, high performance and business continuity for the world's most demanding organisations. Bee Ware is headquartered in Aix-en-Provence, France, and runs operations throughout Europe.




Design: Thomas Heitz - Update: Mathieu Roche