five

lacg030175/CIC-IoT-2023-raw

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/lacg030175/CIC-IoT-2023-raw
下载链接
链接失效反馈
官方服务:
资源简介:
CIC-IoT-2023数据集的一个子样本(1.3M)的原始变体,保留了包含NaN或±infinity值的行(原始数据集通过`pd.dropna`删除了这些行)。该数据集主要用于网络入侵检测和网络安全领域,特别是物联网(IoT)环境中的大规模攻击检测。数据集包含1,342,371行,其中50行(0.004%)在数值特征中包含NaN。数据集分为`random`(80/20)和`random_3way`(80/10/10)两种分割方式。预处理步骤包括将特征列强制转换为数值类型,并通过固定上限对每种攻击类型进行子采样。建议与`ThermometerEncoder(invalid_encoding="single_bit")`一起使用,以将NaN/±inf编码为可学习的状态。数据集由加拿大网络安全研究所(University of New Brunswick)提供,采用CC BY 4.0许可证。

A subsample (1.3M) of the CIC-IoT-2023 dataset in its raw variant, preserving rows with NaN or ±infinity values (the original dataset drops them via `pd.dropna`). This dataset is primarily used for network intrusion detection and cybersecurity, particularly for large-scale attack detection in IoT environments. The dataset contains 1,342,371 rows, with 50 rows (0.004%) containing NaN in numeric features. It is split into `random` (80/20) and `random_3way` (80/10/10) configurations. Preprocessing steps include coercing feature columns to numeric and subsampling via fixed caps per attack type. It is recommended to use with `ThermometerEncoder(invalid_encoding="single_bit")` to encode NaN/±inf as a learnable state. The dataset is provided by the Canadian Institute for Cybersecurity, University of New Brunswick, under the CC BY 4.0 license.
提供机构:
lacg030175
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作