Benchmarking full version of GureKDDCup, UNSW-NB15, and CIDDS-001 NIDS datasets using rolling-origin resampling
收藏DataCite Commons2022-08-29 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Benchmarking_full_version_of_GureKDDCup_UNSW-NB15_and_CIDDS-001_NIDS_datasets_using_rolling-origin_resampling/16834671/2
下载链接
链接失效反馈官方服务:
资源简介:
Network intrusion detection system (NIDS) is a system that analyses network traffic to flag malicious traffic or suspicious activities. Several recent NIDS datasets have been published, however, the lack of baseline experimental results on the full version of datasets had made it difficult for researchers to perform benchmarking. As the train-test distribution of the datasets has yet to be pre-defined by the creators, this further obstruct the researchers to compare the performance unbiasedly across each of the machine classifiers. Moreover, cross-validation resampling scheme have also been addressed in the literatures to be inappropriate in the domain of NIDS. Thus, rolling-origin – a standard resampling technique which is also known as a common cross-validation scheme in the forecasting domain is employed to allocate the training and testing distributions. In this paper, rigorous experiments are conducted on the full version of the three recent NIDS datasets: GureKDDCup, UNSW-NB15, and CIDDS-001. While the datasets chosen might not be the latest available datasets, we have selected them as they include the essential IP address fields which are usually missing or removed due to some sort of privacy concerns. To deliver the baseline empirical results, 10 well-known classifiers from Weka are utilized.
网络入侵检测系统(Network Intrusion Detection System,NIDS)是一类用于分析网络流量以标记恶意流量或可疑活动的系统。近年来已有多款新型NIDS数据集问世,但现有研究尚未针对这些数据集的完整版本提供基准实验结果,这使得研究人员难以开展基准测试工作。由于数据集的训练集-测试集划分尚未由数据集创建者预先定义,这进一步阻碍了研究人员对各类机器学习分类器的性能进行无偏比较。此外,现有文献已指出,交叉验证重采样方案并不适用于NIDS领域。因此,本文采用滚动原点重采样(rolling-origin)——这是一种标准重采样技术,在预测领域也被视为常用的交叉验证方案——来划分训练集与测试集分布。本文针对三款近期发布的NIDS数据集的完整版本开展了严谨实验,这三款数据集分别为GureKDDCup、UNSW-NB15与CIDDS-001。尽管所选数据集并非当前最新发布的NIDS数据集,但我们之所以选用它们,是因为其包含了因隐私问题通常会被遗漏或移除的核心IP地址字段。为了提供基准实证结果,本文采用了Weka平台中的10款经典分类器。
提供机构:
Taylor & Francis
创建时间:
2021-11-18



