In recent years, there have been many proposals pushing for the use of Machine Learning (ML) in automatic network management. This challenge is one of the first explorations of ML for automatic network analysis. Our goal is to promote the use of ML for network-related tasks in general and, at the same time, to assess the participants’ ability to quickly build a learning-based system showing a reliable performance.  Additionally, one difficulty of using ML for network-related applications is the lack of datasets for training and evaluating different algorithms. The challenge provides one of the few datasets for this field, which may become a reference point for future and more advanced research.

As this is one of the first initiative in network classification, we started with a relatively simple multi-class single label classification task, where the labels are standard applications and signals are static network parameters. A more detailed description follows.


Discovery Challenge Chairs

  • Elio Masciari, ICAR CNR, Italy
  • Alessandro Moschitti, Qatar Computing Research Institute, HKBU
    (University of Trento, Italy)

Challenge Organizers

  • Daniele Bonadiman, University of Trento, Italy
  • Susanne Greiner, Würth-Phoenix S.r.l., Italy
  • Luca di Stefano, Würth-Phoenix S.r.l., Italy
  • Olga Uryupina, University of Trento, Italy

Sponsored by:

Task & Dataset

The proposed task regards research work on automatic analysis of network traffic. We monitored the latter in a passive way through sensor probes.

The probe measures various Key Performance Indicators (KPIs) and parameters of transmissions generated by many Web Applications of different types. The objective of the challenge is, given a transmission in the network, to predict the type of the application that is transmitting the data. This is clearly a multi-classification task, single label.

More in detail, each data point corresponds to one http transmission. The data points were collected for an entire day and then split into train (20%), validation (20%) and test (20%) chronologically: morning hours correspond to the training set whereas evening hours constitute the test set.

To eliminate possible dependencies between data points, we left a gap of 20% of data between training, development and test sets. This way, training, validation and test time slots are not adjacent. The table below shows the exact timespans for each part of the data split:


Dataset Train Validation Test
Interval 0% – 20% 40% – 60% 80% – 100%
Start Time 2016-02-14 23:00:01 2016-02-15 10:43:44 2016-02-15 15:27:07
End Time 2016-02-15 8:22:35 2016-02-15 13:06:29 2016-02-15 23:00:00
Datapoints 761179 761179 761180


The table below describes the parameters of the released dataset, with their names corresponding to the provided headers. 


Features Description
cli_pl_header http client response header size
cli_pl_body http client response payload size
cli_cont_len http client declared content length (in the header field)
srv_pl_header http server response header size
srv_pl_body http server response payload size
srv_cont_len http server declared content length (in the header field)
aggregated_sessions number of requests aggregated into one entry
bytes Number of bytes transmitted from the clientandserver comprising the TCP stack header
net_samples — used internally
tcp_frag Number of fragmented packets
tcp_pkts Number of server transmitted packets
tcp_retr Number of retransmitted packets
tcp_ooo Number of out of order packets
cli_tcp_pkts Number of server transmitted packets (Client)
cli_tcp_ooo Number of out of order packets (Client)
cli_tcp_retr Number of retransmitted packets (Client)
cli_tcp_frag Number of fragmented packets (Client)
cli_tcp_empty How many empty TCP packets have been transmitted (Client)
cli_win_change How many times theclient receive window has beenchanged
cli_win_zero How many times the client receive window has been closed
cli_tcp_full How many packets with full payload have been transmitted (Client)
cli_tcp_tot_bytes Client TCP total bytes
cli_pl_tot Client total payload
cli_pl_change How many times the payload has been changed (Client)
srv_tcp_pkts Number of server transmitted packets (Server)
srv_tcp_ooo Number of out of order packets (Server)
srv_tcp_retr Number of retransmitted packets (Server)
srv_tcp_frag Number of fragmented packets (Server)
srv_tcp_empty How many empty TCP packets have been transmitted (Server)
srv_win_change How many times the server receive window has been changed
srv_win_zero How many times the server receive window has been closed
srv_tcp_full How many packets with full payload have been transmitted (Server)
srv_tcp_tot_bytes Server TCP total bytes
srv_pl_tot Server total payload
srv_pl_change How many times the payload has been changed (Server)
srv_tcp_win Last server tcp receive window size
srv_tx_time Server data transmission time
cli_tcp_win Last client tcp receive window size
client_latency Estimated packet delay between client and probe
application_latency Calculated application response time
cli_tx_time Client data transmission time
load_time Roundtrip time since the client request starts up to all server response data are received from client: ~= application_latency+cli_tx_time+srv_tx_time
server_latency Estimated packet delay between server and probe
proxy Flag to identify if it has been used a proxy
sp_healthscore The healthscore specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput
sp_req_duration Time elapsed to elaborate the response by the server
sp_is_lat IS latency
sp_error If the protocol server rejects the request because the current processing load on the server exceeds its capacity, the protocol server includes a SharePointError header set to 2 in the response. If the protocol server renders an error page to the client for any other reason, the protocol server includes a SharePointError header set to zero in the response
throughput Bytes/load_time


Note that (i) we removed parameters to ensure anonymity and (ii) some transmissions do not specify their application due to the specifics of the monitoring setup. Moreover, some applications send only very few transmissions per day. Thus, we labelled such data points as “Unknown Application” (class ID 0)”.

Submission and Important Dates

After the submission opens (see the timeline below), the participants will be requested to submit up to 5 runs: 2 runs on the development data and 3 runs on the test data, one of them should be specified as the “official” submission to be scored for the competition. The runs should have the same format as the gold-labeled datasets.  In the “baseline folder”,  you can find an example of submission using a baseline algorithm (note that this example is computed on the validation data, the participants are requested to submit their runs on the test data).

Challenge Timeline:

  • Aug.  12: the challenge starts, registration opens
  • Aug.  12: training and validation data released
  • Sept.  7: test data released, submission page opens
  • Sept. 10: submissions due
  • Sept. 12: Results and Paper invitations
  • Sept. 23: ECML-PKDD 2016 challenge track

Evaluation Criteria

The participants’ systems will be evaluated with the following metrics:

  • Micro-Recall
  • Macro-Recall
  • Micro-Precision
  • Macro-Precision
  • Micro-F1
  • Macro-F1

The final ranking will be derived based on Macro-F1 evaluated on the test set.

All the measures do not include the true positives from the “Unknown Application” class (ID number 0).

The challenge scorer `` is located in the download subfolder. To run it, one needs to provide the target file and the submission file. For instance, the following command scores the baseline output we provided:

python ../data/valid_target.csv ../data/valid_dt.csv

Each team will be able to present up to 5 runs, each run will be scored independently. The best-scoring submission will receive a prize of 1000 euros.


We provide several baselines, computed using publicly available ML toolkits (Scikit-learn for decision trees and random forest, and Keras for the Multi-Layer Perceptron) with default parameters.

The table below shows baseline results on the validation set.

Classifier Micro-Prec. Micro-Rec. Micro-F1 Macro-Prec. Macro-Rec. Macro-F1
Stratified 0.0245 0.0311 0.0274 0.0099 0.0127 0.0110
Constant (8) 0.0538 0.2810 0.0903 0.0028 0.0526 0.0053
Decision Tree 0.8384 0.7909 0.8140 0.6553 0.7006 0.6772
MLP 0.8232 0.6844 0.7474 0.6819 0.6082 0.6430
Random Forest 0.9650 0.7808 0.8632 0.9102 0.6968 0.7893


  • Stratified: this is a random baseline, it computes the labels for the submission by randomly sampling labels from the distributions of the classes into the training set.
  • Constant (class 8): this assigns the label of the majority class, i.e., class 8 (not considering the unknown class), to all the examples.
  • Decision Tree: this is a strong baseline. It is trained on the training set using the scikit-learn with default parameters.
  • Random Forest: this is the strongest baseline in terms of Macro-F1 (the competition’s main evaluation metrics). Similarly to Decision Trees, it is computed using the scikit-learn default parameters.
  • Multi Layer Perceptron: this is a multi-layer perceptron with two wide hidden layers (double the size of the input layer), with Relu activations trained using the Adam optimizer.


The following rank is computed using the evaluation of the primary runs.

Rank Team name Micro-Prec. Micro-Rec. Micro-F1 Macro-Prec. Macro-Rec. Macro-F1
1 UNITN-CogNet 0.9862 0.9745 0.9803 0.9448 0.8521 0.8961
2 IBM-CogNet 0.9777 0.9795 0.9786 0.8965 0.8831 0.8897
3 wistuba+bujna 0.9878 0.973 0.9803 0.9306 0.8509 0.8889
4 FIIT_STU* 0.9874 0.9723 0.9798 0.9221 0.8487 0.8839
5 colastrong* 0.9882 0.9729 0.9805 0.9279 0.8431 0.8835
6 WekaOne 0.9881 0.9691 0.9785 0.9327 0.8342 0.8807
7 TREELOGIC 0.9875 0.9691 0.9782 0.928 0.8354 0.8793
8 Tubthumpers 0.9866 0.9715 0.9790 0.9131 0.8459 0.8782
9 UPMC_Team* 0.9874 0.9715 0.9794 0.918 0.8372 0.8757
10 unocanda 0.9854 0.9130 0.9478 0.9631 0.7923 0.8694
11 Zarmeen 0.9829 0.9101 0.9451 0.9586 0.7920 0.8674
12 Ranger in R 0.9878 0.9630 0.9753 0.9238 0.8120 0.8643
13 sonam19 0.9865 0.9631 0.9747 0.9224 0.8120 0.8637
14 TelematicUDC 0.9859 0.8935 0.9374 0.9754 0.7699 0.8606
15 DeepDiggers 0.9833 0.9085 0.9444 0.9476 0.7873 0.8600
16 DRL-UNITN-Cognet 0.9901 0.9572 0.9734 0.9280 0.7973 0.8577
17 BaCuDan 0.9713 0.9693 0.9703 0.8600 0.8546 0.8573
18 RushGW 0.9869 0.9596 0.9731 0.9211 0.7967 0.8544
19 netoniq 0.9782 0.9036 0.9394 0.9556 0.7675 0.8513
20 CybElt 0.9827 0.9624 0.9724 0.9121 0.7967 0.8505
21 MaiNTM 0.9783 0.8865 0.9301 0.9387 0.7650 0.8430
22 LSI-UFU 0.9843 0.8945 0.9372 0.9294 0.7572 0.8345
23 ETSISI_UPM 0.9657 0.8858 0.9240 0.9218 0.7582 0.8320
RF_baseline 0.965 0.7808 0.8632 0.9102 0.6968 0.7893
24 RocketScience 0.8185 0.8156 0.8170 0.7205 0.6210 0.6670
25 ICT_UNIFESP 0.3307 0.9614 0.4921 0.3218 0.8607 0.4685


* Late submission due to formatting problems.


The following material:

  • Train data
  • Validation data
  • Scorer
  • Example submission for the validation data. Note that the participants are not requested to submit any runs on the validation data: this file is only provided as an example of the expected format.

can be downloaded through filling the form here.


For any request or clarification please contact us at: