NetCla: The ECML-PKDD Network Classification Challenge

In recent years, there have been many proposals pushing for the use of Machine Learning (ML) in automatic network management. This challenge is one of the first explorations of ML for automatic network analysis. Our goal is to promote the use of ML for network-related tasks in general and, at the same time, to assess the participants’ ability to quickly build a learning-based system showing a reliable performance. Additionally, one difficulty of using ML for network-related applications is the lack of datasets for training and evaluating different algorithms. The challenge provides one of the few datasets for this field, which may become a reference point for future and more advanced research.

As this is one of the first initiative in network classification, we started with a relatively simple multi-class single label classification task, where the labels are standard applications and signals are static network parameters. A more detailed description follows.

Organizers

Discovery Challenge Chairs

Elio Masciari, ICAR CNR, Italy
Alessandro Moschitti, Qatar Computing Research Institute, HKBU
(University of Trento, Italy)

Challenge Organizers

Daniele Bonadiman, University of Trento, Italy
Susanne Greiner, Würth-Phoenix S.r.l., Italy
Luca di Stefano, Würth-Phoenix S.r.l., Italy
Olga Uryupina, University of Trento, Italy

Task & Dataset

The proposed task regards research work on automatic analysis of network traffic. We monitored the latter in a passive way through sensor probes.

The probe measures various Key Performance Indicators (KPIs) and parameters of transmissions generated by many Web Applications of different types. The objective of the challenge is, given a transmission in the network, to predict the type of the application that is transmitting the data. This is clearly a multi-classification task, single label.

More in detail, each data point corresponds to one http transmission. The data points were collected for an entire day and then split into train (20%), validation (20%) and test (20%) chronologically: morning hours correspond to the training set whereas evening hours constitute the test set.

To eliminate possible dependencies between data points, we left a gap of 20% of data between training, development and test sets. This way, training, validation and test time slots are not adjacent. The table below shows the exact timespans for each part of the data split:

Dataset	Train	Validation	Test
Interval	0% – 20%	40% – 60%	80% – 100%
Start Time	2016-02-14 23:00:01	2016-02-15 10:43:44	2016-02-15 15:27:07
End Time	2016-02-15 8:22:35	2016-02-15 13:06:29	2016-02-15 23:00:00
Datapoints	761179	761179	761180

The table below describes the parameters of the released dataset, with their names corresponding to the provided headers.

Features	Description
cli_pl_header	http client response header size
cli_pl_body	http client response payload size
cli_cont_len	http client declared content length (in the header field)
srv_pl_header	http server response header size
srv_pl_body	http server response payload size
srv_cont_len	http server declared content length (in the header field)
aggregated_sessions	number of requests aggregated into one entry
bytes	Number of bytes transmitted from the clientandserver comprising the TCP stack header
net_samples	— used internally
tcp_frag	Number of fragmented packets
tcp_pkts	Number of server transmitted packets
tcp_retr	Number of retransmitted packets
tcp_ooo	Number of out of order packets
cli_tcp_pkts	Number of server transmitted packets (Client)
cli_tcp_ooo	Number of out of order packets (Client)
cli_tcp_retr	Number of retransmitted packets (Client)
cli_tcp_frag	Number of fragmented packets (Client)
cli_tcp_empty	How many empty TCP packets have been transmitted (Client)
cli_win_change	How many times theclient receive window has beenchanged
cli_win_zero	How many times the client receive window has been closed
cli_tcp_full	How many packets with full payload have been transmitted (Client)
cli_tcp_tot_bytes	Client TCP total bytes
cli_pl_tot	Client total payload
cli_pl_change	How many times the payload has been changed (Client)
srv_tcp_pkts	Number of server transmitted packets (Server)
srv_tcp_ooo	Number of out of order packets (Server)
srv_tcp_retr	Number of retransmitted packets (Server)
srv_tcp_frag	Number of fragmented packets (Server)
srv_tcp_empty	How many empty TCP packets have been transmitted (Server)
srv_win_change	How many times the server receive window has been changed
srv_win_zero	How many times the server receive window has been closed
srv_tcp_full	How many packets with full payload have been transmitted (Server)
srv_tcp_tot_bytes	Server TCP total bytes
srv_pl_tot	Server total payload
srv_pl_change	How many times the payload has been changed (Server)
srv_tcp_win	Last server tcp receive window size
srv_tx_time	Server data transmission time
cli_tcp_win	Last client tcp receive window size
client_latency	Estimated packet delay between client and probe
application_latency	Calculated application response time
cli_tx_time	Client data transmission time
load_time	Roundtrip time since the client request starts up to all server response data are received from client: ~= application_latency+cli_tx_time+srv_tx_time
server_latency	Estimated packet delay between server and probe
proxy	Flag to identify if it has been used a proxy
sp_healthscore	The healthscore specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput
sp_req_duration	Time elapsed to elaborate the response by the server
sp_is_lat	IS latency
sp_error	If the protocol server rejects the request because the current processing load on the server exceeds its capacity, the protocol server includes a SharePointError header set to 2 in the response. If the protocol server renders an error page to the client for any other reason, the protocol server includes a SharePointError header set to zero in the response
throughput	Bytes/load_time

Note that (i) we removed parameters to ensure anonymity and (ii) some transmissions do not specify their application due to the specifics of the monitoring setup. Moreover, some applications send only very few transmissions per day. Thus, we labelled such data points as “Unknown Application” (class ID 0)”.

Submission and Important Dates

After the submission opens (see the timeline below), the participants will be requested to submit up to 5 runs: 2 runs on the development data and 3 runs on the test data, one of them should be specified as the “official” submission to be scored for the competition. The runs should have the same format as the gold-labeled datasets. In the “baseline folder”, you can find an example of submission using a baseline algorithm (note that this example is computed on the validation data, the participants are requested to submit their runs on the test data).

Challenge Timeline:

Aug. 12: the challenge starts, registration opens
Aug. 12: training and validation data released
Sept. 7: test data released, submission page opens
Sept. 10: submissions due
Sept. 12: Results and Paper invitations
Sept. 23: ECML-PKDD 2016 challenge track

Evaluation Criteria

The participants’ systems will be evaluated with the following metrics:

Micro-Recall
Macro-Recall
Micro-Precision
Macro-Precision
Micro-F1
Macro-F1

The final ranking will be derived based on Macro-F1 evaluated on the test set.

All the measures do not include the true positives from the “Unknown Application” class (ID number 0).

The challenge scorer `eval.py` is located in the download subfolder. To run it, one needs to provide the target file and the submission file. For instance, the following command scores the baseline output we provided:

python eval.py ../data/valid_target.csv ../data/valid_dt.csv

Each team will be able to present up to 5 runs, each run will be scored independently. The best-scoring submission will receive a prize of 1000 euros.

Baselines

We provide several baselines, computed using publicly available ML toolkits (Scikit-learn for decision trees and random forest, and Keras for the Multi-Layer Perceptron) with default parameters.

The table below shows baseline results on the validation set.

Classifier	Micro-Prec.	Micro-Rec.	Micro-F1	Macro-Prec.	Macro-Rec.	Macro-F1
Stratified	0.0245	0.0311	0.0274	0.0099	0.0127	0.0110
Constant (8)	0.0538	0.2810	0.0903	0.0028	0.0526	0.0053
Decision Tree	0.8384	0.7909	0.8140	0.6553	0.7006	0.6772
MLP	0.8232	0.6844	0.7474	0.6819	0.6082	0.6430
*Random Forest*	0.9650	0.7808	0.8632	0.9102	0.6968	0.7893

Stratified: this is a random baseline, it computes the labels for the submission by randomly sampling labels from the distributions of the classes into the training set.

Constant (class 8): this assigns the label of the majority class, i.e., class 8 (not considering the unknown class), to all the examples.

Decision Tree: this is a strong baseline. It is trained on the training set using the scikit-learn with default parameters.

Random Forest: this is the strongest baseline in terms of Macro-F1 (the competition’s main evaluation metrics). Similarly to Decision Trees, it is computed using the scikit-learn default parameters.

Multi Layer Perceptron: this is a multi-layer perceptron with two wide hidden layers (double the size of the input layer), with Relu activations trained using the Adam optimizer.

Leaderboard

The following rank is computed using the evaluation of the primary runs.

Rank	Team name	Micro-Prec.	Micro-Rec.	Micro-F1	Macro-Prec.	Macro-Rec.	Macro-F1
1	UNITN-CogNet	0.9862	0.9745	0.9803	0.9448	0.8521	0.8961
2	IBM-CogNet	0.9777	0.9795	0.9786	0.8965	0.8831	0.8897
3	wistuba+bujna	0.9878	0.973	0.9803	0.9306	0.8509	0.8889
4	FIIT_STU*	0.9874	0.9723	0.9798	0.9221	0.8487	0.8839
5	colastrong*	0.9882	0.9729	0.9805	0.9279	0.8431	0.8835
6	WekaOne	0.9881	0.9691	0.9785	0.9327	0.8342	0.8807
7	TREELOGIC	0.9875	0.9691	0.9782	0.928	0.8354	0.8793
8	Tubthumpers	0.9866	0.9715	0.9790	0.9131	0.8459	0.8782
9	UPMC_Team*	0.9874	0.9715	0.9794	0.918	0.8372	0.8757
10	unocanda	0.9854	0.9130	0.9478	0.9631	0.7923	0.8694
11	Zarmeen	0.9829	0.9101	0.9451	0.9586	0.7920	0.8674
12	Ranger in R	0.9878	0.9630	0.9753	0.9238	0.8120	0.8643
13	sonam19	0.9865	0.9631	0.9747	0.9224	0.8120	0.8637
14	TelematicUDC	0.9859	0.8935	0.9374	0.9754	0.7699	0.8606
15	DeepDiggers	0.9833	0.9085	0.9444	0.9476	0.7873	0.8600
16	DRL-UNITN-Cognet	0.9901	0.9572	0.9734	0.9280	0.7973	0.8577
17	BaCuDan	0.9713	0.9693	0.9703	0.8600	0.8546	0.8573
18	RushGW	0.9869	0.9596	0.9731	0.9211	0.7967	0.8544
19	netoniq	0.9782	0.9036	0.9394	0.9556	0.7675	0.8513
20	CybElt	0.9827	0.9624	0.9724	0.9121	0.7967	0.8505
21	MaiNTM	0.9783	0.8865	0.9301	0.9387	0.7650	0.8430
22	LSI-UFU	0.9843	0.8945	0.9372	0.9294	0.7572	0.8345
23	ETSISI_UPM	0.9657	0.8858	0.9240	0.9218	0.7582	0.8320
—	RF_baseline	0.965	0.7808	0.8632	0.9102	0.6968	0.7893
24	RocketScience	0.8185	0.8156	0.8170	0.7205	0.6210	0.6670
25	ICT_UNIFESP	0.3307	0.9614	0.4921	0.3218	0.8607	0.4685

* Late submission due to formatting problems.

Download

The following material:

Train data
Validation data
Scorer
Example submission for the validation data. Note that the participants are not requested to submit any runs on the validation data: this file is only provided as an example of the expected format.

can be downloaded through filling the form here.

Contacts

For any request or clarification please contact us at: ecmlpkdd.ne@gmail.com

NetCla: The ECML-PKDD Network Classification Challenge

Organizers

Task & Dataset

Submission and Important Dates

Evaluation Criteria

Baselines

Leaderboard

Download

Contacts

Search by technology

Contact

Subscribe to blog

Categories

Recent posts

Archive