lacg030175/CIC-IoT-2023-neto-full
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/lacg030175/CIC-IoT-2023-neto-full
下载链接
链接失效反馈官方服务:
资源简介:
这是Neto等人(2023年)发布的权威CIC-IoT-2023数据集,包含46,686,579行数据和46个特征。数据集源自Kaggle镜像`akashdogra/ciciot23csv`(13.75 GB单CSV文件),该镜像又来自CIC官方发布的169个文件。与bencorn的HF镜像(45M行,39个特征)相比,此数据集保留了所有46个原始特征(bencorn删除了7个)和完整的46.7M行数(bencorn重新合并时丢失了约1.7M行)。NaN/±inf行被保留(未进行dropna处理)。数据集提供了两种分割方式:`random_3way`(80%训练,10%测试,10%验证,基于二进制标签分层,种子=42)和`random`(80%训练,20%测试)。数据集保留了原始的`Label`(规范化,混合大小写)和`Label_orig`(原始大写),以及标准化的`attack_class`和二进制`label`。类别分布如下:DDoS(72.79%)、DoS(17.33%)、Mirai(5.64%)、Benign(2.35%)、Spoofing(1.04%)、Recon(0.76%)、Web-based(0.05%)、BruteForce(0.03%)。
This is the authoritative canonical CIC-IoT-2023 dataset at the row count published by Neto et al. (2023): 46,686,579 rows × 46 features. Sourced from the Kaggle mirror `akashdogra/ciciot23csv` (13.75 GB single CSV) which itself was derived from CICs official 169-file distribution. Compared to bencorns HF mirror (45M, 39 features), this preserves: All 46 original features (bencorn dropped 7) and the full 46.7M row count (bencorn re-merge lost ~1.7M). NaN/±inf rows are kept (no dropna). The dataset provides two splits: `random_3way` (80% train / 10% test / 10% validation, stratified on binary label, seed=42) and `random` (80% train / 20% test). It preserves the original `Label` (canonicalized, mixed case) AND `Label_orig` (original UPPERCASE), plus normalized `attack_class` and binary `label`. Class distribution: DDoS (72.79%), DoS (17.33%), Mirai (5.64%), Benign (2.35%), Spoofing (1.04%), Recon (0.76%), Web-based (0.05%), BruteForce (0.03%).
提供机构:
lacg030175



