In recent years, there have been many proposals pushing for the use of Machine Learning (ML) in automatic network management. This challenge is one of the first explorations of ML for automatic network analysis. Our goal is to promote the use of ML for network-related tasks in general and, at the same time, to assess the participants’ ability to quickly build a learning-based system showing a reliable performance. Additionally, one difficulty of using ML for network-related applications is the lack of datasets for training and evaluating different algorithms. The challenge provides one of the few datasets for this field, which may become a reference point for future and more advanced research.
As this is one of the first initiative in network classification, we started with a relatively simple multi-class single label classification task, where the labels are standard applications and signals are static network parameters. A more detailed description follows.
Discovery Challenge Chairs
- Elio Masciari, ICAR CNR, Italy
- Alessandro Moschitti, Qatar Computing Research Institute, HKBU
(University of Trento, Italy)
- Daniele Bonadiman, University of Trento, Italy
- Susanne Greiner, Würth-Phoenix S.r.l., Italy
- Luca di Stefano, Würth-Phoenix S.r.l., Italy
- Olga Uryupina, University of Trento, Italy
Task & Dataset
The proposed task regards research work on automatic analysis of network traffic. We monitored the latter in a passive way through sensor probes.
The probe measures various Key Performance Indicators (KPIs) and parameters of transmissions generated by many Web Applications of different types. The objective of the challenge is, given a transmission in the network, to predict the type of the application that is transmitting the data. This is clearly a multi-classification task, single label.
More in detail, each data point corresponds to one http transmission. The data points were collected for an entire day and then split into train (20%), validation (20%) and test (20%) chronologically: morning hours correspond to the training set whereas evening hours constitute the test set.
To eliminate possible dependencies between data points, we left a gap of 20% of data between training, development and test sets. This way, training, validation and test time slots are not adjacent. The table below shows the exact timespans for each part of the data split:
|Interval||0% – 20%||40% – 60%||80% – 100%|
|Start Time||2016-02-14 23:00:01||2016-02-15 10:43:44||2016-02-15 15:27:07|
|End Time||2016-02-15 8:22:35||2016-02-15 13:06:29||2016-02-15 23:00:00|
The table below describes the parameters of the released dataset, with their names corresponding to the provided headers.
|cli_pl_header||http client response header size|
|cli_pl_body||http client response payload size|
|cli_cont_len||http client declared content length (in the header field)|
|srv_pl_header||http server response header size|
|srv_pl_body||http server response payload size|
|srv_cont_len||http server declared content length (in the header field)|
|aggregated_sessions||number of requests aggregated into one entry|
|bytes||Number of bytes transmitted from the clientandserver comprising the TCP stack header|
|net_samples||— used internally|
|tcp_frag||Number of fragmented packets|
|tcp_pkts||Number of server transmitted packets|
|tcp_retr||Number of retransmitted packets|
|tcp_ooo||Number of out of order packets|
|cli_tcp_pkts||Number of server transmitted packets (Client)|
|cli_tcp_ooo||Number of out of order packets (Client)|
|cli_tcp_retr||Number of retransmitted packets (Client)|
|cli_tcp_frag||Number of fragmented packets (Client)|
|cli_tcp_empty||How many empty TCP packets have been transmitted (Client)|
|cli_win_change||How many times theclient receive window has beenchanged|
|cli_win_zero||How many times the client receive window has been closed|
|cli_tcp_full||How many packets with full payload have been transmitted (Client)|
|cli_tcp_tot_bytes||Client TCP total bytes|
|cli_pl_tot||Client total payload|
|cli_pl_change||How many times the payload has been changed (Client)|
|srv_tcp_pkts||Number of server transmitted packets (Server)|
|srv_tcp_ooo||Number of out of order packets (Server)|
|srv_tcp_retr||Number of retransmitted packets (Server)|
|srv_tcp_frag||Number of fragmented packets (Server)|
|srv_tcp_empty||How many empty TCP packets have been transmitted (Server)|
|srv_win_change||How many times the server receive window has been changed|
|srv_win_zero||How many times the server receive window has been closed|
|srv_tcp_full||How many packets with full payload have been transmitted (Server)|
|srv_tcp_tot_bytes||Server TCP total bytes|
|srv_pl_tot||Server total payload|
|srv_pl_change||How many times the payload has been changed (Server)|
|srv_tcp_win||Last server tcp receive window size|
|srv_tx_time||Server data transmission time|
|cli_tcp_win||Last client tcp receive window size|
|client_latency||Estimated packet delay between client and probe|
|application_latency||Calculated application response time|
|cli_tx_time||Client data transmission time|
|load_time||Roundtrip time since the client request starts up to all server response data are received from client: ~= application_latency+cli_tx_time+srv_tx_time|
|server_latency||Estimated packet delay between server and probe|
|proxy||Flag to identify if it has been used a proxy|
|sp_healthscore||The healthscore specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput|
|sp_req_duration||Time elapsed to elaborate the response by the server|
|sp_error||If the protocol server rejects the request because the current processing load on the server exceeds its capacity, the protocol server includes a SharePointError header set to 2 in the response. If the protocol server renders an error page to the client for any other reason, the protocol server includes a SharePointError header set to zero in the response|
Note that (i) we removed parameters to ensure anonymity and (ii) some transmissions do not specify their application due to the specifics of the monitoring setup. Moreover, some applications send only very few transmissions per day. Thus, we labelled such data points as “Unknown Application” (class ID 0)”.
Submission and Important Dates
After the submission opens (see the timeline below), the participants will be requested to submit up to 5 runs: 2 runs on the development data and 3 runs on the test data, one of them should be specified as the “official” submission to be scored for the competition. The runs should have the same format as the gold-labeled datasets. In the “baseline folder”, you can find an example of submission using a baseline algorithm (note that this example is computed on the validation data, the participants are requested to submit their runs on the test data).
- Aug. 12: the challenge starts, registration opens
- Aug. 12: training and validation data released
- Sept. 7: test data released, submission page opens
- Sept. 10: submissions due
- Sept. 12: Results and Paper invitations
- Sept. 23: ECML-PKDD 2016 challenge track
The participants’ systems will be evaluated with the following metrics:
The final ranking will be derived based on Macro-F1 evaluated on the test set.
All the measures do not include the true positives from the “Unknown Application” class (ID number 0).
The challenge scorer `eval.py` is located in the download subfolder. To run it, one needs to provide the target file and the submission file. For instance, the following command scores the baseline output we provided:
python eval.py ../data/valid_target.csv ../data/valid_dt.csv
Each team will be able to present up to 5 runs, each run will be scored independently. The best-scoring submission will receive a prize of 1000 euros.
We provide several baselines, computed using publicly available ML toolkits (Scikit-learn for decision trees and random forest, and Keras for the Multi-Layer Perceptron) with default parameters.
The table below shows baseline results on the validation set.
- Stratified: this is a random baseline, it computes the labels for the submission by randomly sampling labels from the distributions of the classes into the training set.
- Constant (class 8): this assigns the label of the majority class, i.e., class 8 (not considering the unknown class), to all the examples.
- Decision Tree: this is a strong baseline. It is trained on the training set using the scikit-learn with default parameters.
- Random Forest: this is the strongest baseline in terms of Macro-F1 (the competition’s main evaluation metrics). Similarly to Decision Trees, it is computed using the scikit-learn default parameters.
- Multi Layer Perceptron: this is a multi-layer perceptron with two wide hidden layers (double the size of the input layer), with Relu activations trained using the Adam optimizer.
The following rank is computed using the evaluation of the primary runs.
|12||Ranger in R||0.9878||0.9630||0.9753||0.9238||0.8120||0.8643|
* Late submission due to formatting problems.
The following material:
- Train data
- Validation data
- Example submission for the validation data. Note that the participants are not requested to submit any runs on the validation data: this file is only provided as an example of the expected format.
can be downloaded through filling the form here.
For any request or clarification please contact us at: firstname.lastname@example.org