lacg030175/CIC-IoT-2023-full
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/lacg030175/CIC-IoT-2023-full
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
size_categories:
- 10M<n<100M
task_categories:
- tabular-classification
tags:
- network-intrusion-detection
- cybersecurity
- CIC-IoT-2023
- IoT
- IDS
- binary-classification
pretty_name: CIC-IoT-2023 Full (46M) IoT Intrusion Detection
configs:
- config_name: random
data_files:
- split: train
path: random/train-*
- split: test
path: random/test-*
- config_name: random_3way
data_files:
- split: train
path: random_3way/train-*
- split: test
path: random_3way/test-*
- split: validation
path: random_3way/validation-*
default: true
---
# CIC-IoT-2023 Full Dataset (46M+ rows)
The FULL [CICIoT2023](https://www.unb.ca/cic/datasets/iotdataset-2023.html) dataset — all 38,508,041 rows, no subsampling.
## Configuration
### `random_3way` (default) — 80/10/10 Three-Way Split
Stratified random split with fully separated sets:
- **Train (80%)**: 30,806,432 rows — model training and architecture search
- **Test (10%)**: 3,850,804 rows — threshold calibration (held out from training)
- **Validation (10%)**: 3,850,805 rows — final reported metrics (never touched)
```python
from datasets import load_dataset
ds = load_dataset("lacg030175/CIC-IoT-2023-full", "random_3way")
```
## Class Distribution
| Class | Count | % |
|---|---:|---:|
| Benign | 1,098,126 | 2.9% |
| Attack | 37,409,915 | 97.1% |
## Comparison with Subsampled Version
| Version | Rows | Benign % | Purpose |
|---|---:|---:|---|
| `lacg030175/CIC-IoT-2023` | ~1.3M | ~15% | Fast experiments (112 runs) |
| **This dataset** | **38,508,041** | **2.9%** | **Literature comparison** |
## Top-20 RF Features
1. Number
2. ack_flag_number
3. Header_Length
4. HTTPS
5. Time_To_Live
6. psh_flag_number
7. Rate
8. IAT
9. Max
10. ack_count
11. Tot sum
12. Variance
13. Tot size
14. Std
15. Min
16. AVG
17. syn_count
18. syn_flag_number
19. HTTP
20. fin_count
## Attack Types (7 classes, 33 sub-types)
| Class | Sub-types |
|---|---|
| Benign | BenignTraffic |
| BruteForce | DictionaryBruteForce |
| DDoS | 12 sub-types |
| DoS | 4 sub-types |
| Mirai | 3 sub-types |
| Recon | 5 sub-types |
| Spoofing | 2 sub-types |
| Web-based | 6 sub-types |
## Labels
- **Binary** (`label`): 0 = Benign, 1 = Attack
- **Multi-class** (`Label`): 34 categories
- **Grouped** (`attack_class`): 8 classes
## Citation
```bibtex
@article{neto2023ciciot,
title={CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment},
author={Neto, Euclides Carlos Pinto and others},
journal={Sensors},
volume={23},
number={13},
year={2023},
publisher={MDPI}
}
```
## License
CC BY 4.0 — original dataset by the Canadian Institute for Cybersecurity, University of New Brunswick.
提供机构:
lacg030175



