five

Network traffic datasets with novel extended IP flow called NetTiSA flow

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8301042
下载链接
链接失效反馈
官方服务:
资源简介:
Network traffic datasets with novel extended IP flow called NetTiSA flow Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as: Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286 @article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} } This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.  NetTiSA flow feature vector The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets.  The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.   Flow features The flow features are: Packets is the number of packets in the direction from the source to the destination IP address. Packets in reverse order is the number of packets in the direction from the destination to the source IP address. Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address. Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.   Statistical and Time-based features The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features: Mean represents mean of the payload lengths of packets Min is the minimal value from payload lengths of all packets in a flow Max is the maximum value from payload lengths of all packets in a flow Standard deviation is a measure of the variation of payload lengths from the mean payload length Root mean square is the measure of the magnitude of payload lengths of packets Average dispersion is the average absolute difference between each payload length of the packet and the mean value Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution Mean of relative times is the mean of the relative times which is a sequence defined as \(st = \{t_1 - t_1, t_2 - t_1, ..., t_n - t_1\} \) Mean of time differences is the mean of the time differences which is a sequence defined as \(dt = \{ t_j - t_i | j = i + 1, i \in \{1, 2, \dots, n - 1\} \}.\) Min from time differences is the minimal value from all time differences, i.e., min space between packets. Max from time differences is the maximum value from all time differences, i.e., max space between packets. Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:\(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{\{dt_{n-1}\}} - dt_i \right| }{ \frac{1}{2} \left(max\left(\{dt_{n-1}\}\right) - min\left(\{dt_{n-1}\}\right) \right) }\) Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:\(sr = \frac{s_n}{\frac{1}{2} (n - 1)}\)            where \(s_n\) is number of switches.    Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features: Max minus min  is the difference between minimum and maximum payload lengths Percent deviation is the dispersion of the average absolute difference to the mean value Variance is the spread measure of the data from its mean Burstiness is the degree of peakedness in the central part of the distribution Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement Directions describe a percentage ratio of packet direction computed as \(\frac{d_1}{ d_1 + d_0}\), where \(d_1\) is a number of packets in a direction from source to destination IP address and \(d_0\) the opposite direction. Both  \(d_1\) and \(d_0\) are inside the classical bidirectional flow. Duration is the duration of the flow   The NetTiSA flow is implemented into IP flow exporter ipfixprobe.   Description of dataset files In the following table is a description of each dataset file: File name Detection problem Citation of the original raw dataset botnet_binary.csv  Binary detection of botnet  S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.  botnet_multiclass.csv  Multi-class classification of botnet  S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.  cryptomining_design.csv  Binary detection of cryptomining; the design part  Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022  cryptomining_evaluation.csv  Binary detection of cryptomining; the evaluation part  Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022  dns_malware.csv  Binary detection of malware DNS  Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.  doh_cic.csv  Binary detection of DoH  Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020  doh_real_world.csv  Binary detection of DoH  Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022  dos.csv  Binary detection of DoS  Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.  edge_iiot_binary.csv  Binary detection of IoT malware  Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.  edge_iiot_multiclass.csv  Multi-class classification of IoT malware  Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.  https_brute_force.csv  Binary detection of HTTPS Brute Force  Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020  ids_cic_binary.csv  Binary detection of intrusion in IDS  Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.  ids_cic_multiclass.csv  Multi-class classification of intrusion in IDS  Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.  unsw_binary.csv  Binary detection of intrusion in IDS  Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.  unsw_multiclass.csv  Multi-class classification of intrusion in IDS  Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.  iot_23.csv  Binary detection of IoT malware  Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23  ton_iot_binary.csv  Binary detection of IoT malware  Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021  ton_iot_multiclass.csv  Multi-class classification of IoT malware  Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021  tor_binary.csv  Binary detection of TOR  Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.  tor_multiclass.csv  Multi-class classification of TOR  Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.  vpn_iscx_binary.csv  Binary detection of VPN  Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.  vpn_iscx_multiclass.csv  Multi-class classification of VPN  Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.  vpn_vnat_binary.csv  Binary detection of VPN  Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022  vpn_vnat_multiclass.csv  Multi-class classification of VPN  Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
创建时间:
2024-04-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作