"NIDS_Network_Benchmark Dataset APT : A Novel Network Intrusion Detection Dataset with Advanced Persistent Threat Traffic for AI-"
收藏DataCite Commons2026-04-08 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/nidsnetworkbenchmark-dataset-apt-novel-network-intrusion-detection-dataset-advanced
下载链接
链接失效反馈官方服务:
资源简介:
"Network Intrusion Detection Systems (NIDS) are a critical component of modern cybersecurity infrastructure, and the quality of benchmark datasets directly determines the reliability of AI-based models trained on them. Existing public datasets such as CICIDS-2017, CICIDS-2019, CIC-IDS-2018, and UNSW-NB15 suffer from well-documented limitations, including the absence of encrypted Advanced Persistent Threat (APT) traffic, lack of IoT protocol attack classes, unresolved NaN\/Infinity values in computed flow features, and insufficient reproducibility of the generation environment. This paper presents NIDS_Network_Benchmark Dataset APT, a novel flow-level network intrusion detection dataset designed to address these gaps and serve as a rigorous benchmark for next-generation AI-based firewall and intrusion detection models. The dataset comprises 49,000 labelled network flows across 13 traffic classes \u2014including two classes entirely absent from prior public datasets: encrypted HTTPS-based Command and Control (C2) beaconing, representative of real-world APT lateral movement, and IoT MQTT broker flood attacks on port 1883. In addition to 84 CICFlowMeter-compatible features ensuring backward compatibility with existing models, the dataset introduces six novel derived features: Byte_Entropy, IAT_Coefficient_Variation, Burstiness_Index, Payload_Ratio_Fwd, TLS_Encrypted_Flag, and Pkt_Size_Skewness. These features are specifically designed to detect evasive encrypted threats that conventional flow statistics cannot distinguish. The dataset is entirely free of NaN and Infinity values \u2014 a known flaw in CICIDS-2017 \u2014 and is accompanied by full Python generation scripts for reproducibility. A Random Forest baseline achieves 97.9% macro F1-score, confirming dataset validity and learnability for the research community."
提供机构:
IEEE DataPort
创建时间:
2026-04-08



