five

vishwa132/CICIDS-2017

收藏
Hugging Face2026-02-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/vishwa132/CICIDS-2017
下载链接
链接失效反馈
官方服务:
资源简介:
Raw network data was collected over a period of 5 days, Monday through Friday, and stored in PCAP files. Monday was used to create most of the Benign data, while the Attack-Network implemented various types of attacks over the next 4 days, such as Brute Force connections (FTP and SSH), several types of DoS attacks, as well as a Botnet attack, Infiltration attacks and subsequent Port-Scanning activity. The PCAP data was processed using a tool developed by one of the authors of [1], called CICFlowMeter [3]. This tool produces flow traces: sequences of packets between specific source and destination IP, with corresponding values for source and destination ports. TCP flows are usually terminated by connection teardowns, while UDP flows are terminated by a flow timeout. For each of these flow traces many features were selected, measuring flow characteristics, such as packet size, number of packets, flow duration, etc. For some of these variables, statistics such as their mean and standard deviations are provided as features as well. While several features are categorical (such as IP addresses, Port numbers and TCP flag counts), most of the other features are numerical. The result is the CICIDS-2017 dataset, with about 80 features and several attack families which can ultimately be divided in 16 categories: one Benign category and 15 Attack categories. This original dataset is available at [4]. Subsequently, the authors of [2] spent a lot of effort to correct some errors in the dataset, by fixing the CICFlowMeter software (especially regarding TCP flow terminations) and by re-labeling some of the samples accordingly. They posted the corrected dataset on their website [5]; this also has links to their GitHub site, which provides Python code that can be used to efficiently import the data. I used that as a starting point for my notebook, here on Kaggle. For each of the 5 days a csv file with network flows was produced. These are the files in the dataset, with some changes: I created decimal values for the IP-addresses, and I removed a couple of rows with inf values. [1] Sharafaldin I., Lashkari A.H., and Ghorbani A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization, Proceedings of the 4th International Conference on Information Systems Security and Privacy ICISSP - Volume 1, 108-116, 2018. [2] Engelen G., Rimmer V., and Joosen W. Troubleshooting an intrusion detection dataset: the CICIDS2017 case study, 2021 IEEE Security and Privacy Workshops (SPW), 2021:7-12. [3] https://www.unb.ca/cic/research/applications.html [4] https://www.unb.ca/cic/datasets/ids-2017.html [5] https://intrusion-detection.distrinet-research.be/CNS2022/index.html

本数据集的原始网络数据采集周期为5天(周一至周五),并存储为PCAP文件(PCAP)。采集首日(周一)用于生成绝大多数良性流量数据,后续4天则在攻击网络环境中开展多种类型的攻击实验,包括FTP与SSH暴力破解连接、多种拒绝服务(Denial of Service, DoS)攻击、僵尸网络攻击、渗透攻击以及后续的端口扫描活动。 随后,研究团队使用文献[1]中一位作者开发的工具CICFlowMeter(CICFlowMeter)对PCAP格式的原始数据进行处理。该工具会生成流追踪数据:即特定源IP与目的IP之间的数据包序列,并附带对应的源端口与目的端口数值。传输控制协议(Transmission Control Protocol, TCP)流通常以连接终止流程结束,而用户数据报协议(User Datagram Protocol, UDP)流则以流超时方式终止。针对每一条流追踪数据,我们选取了多项表征流特征的属性,例如数据包大小、数据包数量、流持续时长等。针对部分变量,还会将其均值、标准差等统计量作为补充属性。尽管部分属性为分类属性(如IP地址、端口号、TCP标记计数),但绝大多数属性均为数值型属性。 最终生成的数据集为CICIDS-2017(CICIDS-2017),其包含约80项属性,涵盖多类攻击场景,最终可划分为16个类别:1个良性类别与15个攻击类别。原始数据集可通过链接[4]获取。随后,文献[2]的作者投入大量精力对该数据集进行修正:他们修复了CICFlowMeter工具的相关漏洞(尤其是TCP流终止流程相关的问题),并据此对部分样本进行了重新标注。修正后的数据集已发布至其官方网站[5],该页面同时附带其GitHub仓库链接,仓库中提供了可高效导入该数据集的Python代码。本次Kaggle平台上的代码笔记即以该修正数据集作为初始数据源。 5天的采集周期各生成一份包含网络流数据的逗号分隔值(Comma-Separated Values, CSV)文件。本数据集的文件存在以下两处调整:一是将IP地址转换为十进制数值格式,二是移除了部分包含无穷大(inf)值的数据行。 ### 参考文献 [1] Sharafaldin I., Lashkari A.H. 与 Ghorbani A.A. 《面向新型入侵检测数据集的构建与入侵流量特征表征》,发表于第4届信息系统安全与隐私国际会议(ICISSP)卷1,108-116页,2018年。 [2] Engelen G., Rimmer V. 与 Joosen W. 《入侵检测数据集的故障排查:以CICIDS2017数据集为例》,发表于2021年IEEE安全与隐私研讨会(SPW),7-12页,2021年。 [3] https://www.unb.ca/cic/research/applications.html [4] https://www.unb.ca/cic/datasets/ids-2017.html [5] https://intrusion-detection.distrinet-research.be/CNS2022/index.html
提供机构:
vishwa132
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作