"NIDS_Network_Benchmark Dataset APT : A Novel Network Intrusion Detection Dataset with Advanced Persistent Threat Traffic for AI-"

Name: "NIDS_Network_Benchmark Dataset APT : A Novel Network Intrusion Detection Dataset with Advanced Persistent Threat Traffic for AI-"
Creator: IEEE DataPort
Published: 2026-04-08 08:32:49
License: 暂无描述

DataCite Commons2026-04-08 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/nidsnetworkbenchmark-dataset-apt-novel-network-intrusion-detection-dataset-advanced

下载链接

链接失效反馈

官方服务：

资源简介：

"Network Intrusion Detection Systems (NIDS) are a critical component of modern cybersecurity infrastructure, and the quality of benchmark datasets directly determines the reliability of AI-based models trained on them. Existing public datasets such as CICIDS-2017, CICIDS-2019, CIC-IDS-2018, and UNSW-NB15 suffer from well-documented limitations, including the absence of encrypted Advanced Persistent Threat (APT) traffic, lack of IoT protocol attack classes, unresolved NaN\/Infinity values in computed flow features, and insufficient reproducibility of the generation environment. This paper presents NIDS_Network_Benchmark Dataset APT, a novel flow-level network intrusion detection dataset designed to address these gaps and serve as a rigorous benchmark for next-generation AI-based firewall and intrusion detection models. The dataset comprises 49,000 labelled network flows across 13 traffic classes \u2014including two classes entirely absent from prior public datasets: encrypted HTTPS-based Command and Control (C2) beaconing, representative of real-world APT lateral movement, and IoT MQTT broker flood attacks on port 1883. In addition to 84 CICFlowMeter-compatible features ensuring backward compatibility with existing models, the dataset introduces six novel derived features: Byte_Entropy, IAT_Coefficient_Variation, Burstiness_Index, Payload_Ratio_Fwd, TLS_Encrypted_Flag, and Pkt_Size_Skewness. These features are specifically designed to detect evasive encrypted threats that conventional flow statistics cannot distinguish. The dataset is entirely free of NaN and Infinity values \u2014 a known flaw in CICIDS-2017 \u2014 and is accompanied by full Python generation scripts for reproducibility. A Random Forest baseline achieves 97.9% macro F1-score, confirming dataset validity and learnability for the research community."

提供机构：

IEEE DataPort

创建时间：

2026-04-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集