lacg030175/CIC-IoT-2023
收藏Hugging Face2026-04-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lacg030175/CIC-IoT-2023
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
size_categories:
- 1M<n<10M
task_categories:
- tabular-classification
tags:
- network-intrusion-detection
- cybersecurity
- CIC-IoT-2023
- IoT
- IDS
- binary-classification
pretty_name: CIC-IoT-2023 IoT Intrusion Detection
configs:
- config_name: random_3way
data_files:
- split: train
path: random_3way/train-*
- split: test
path: random_3way/test-*
- split: validation
path: random_3way/validation-*
default: true
- config_name: random
data_files:
- split: train
path: random/train-*
- split: test
path: random/test-*
---
# CIC-IoT-2023 IoT Intrusion Detection Dataset
The [CICIoT2023](https://www.unb.ca/cic/datasets/iotdataset-2023.html) dataset from the Canadian Institute for Cybersecurity, subsampled and preprocessed for machine learning evaluation.
## Configurations
### `random_3way` (default) — 80/10/10 Three-Way Split
Stratified random split with fully separated train/test/validation sets:
- **Train (80%)**: Model training and architecture search
- **Test (10%)**: Threshold calibration (held out from training)
- **Validation (10%)**: Final reported metrics (never touched during training or calibration)
```python
from datasets import load_dataset
ds = load_dataset("lacg030175/CIC-IoT-2023", "random_3way")
# ds["train"]: 1,073,851 rows
# ds["test"]: 134,231 rows
# ds["validation"]: 134,232 rows
```
### `random` (legacy) — 80/20 Split
Original 80/20 split for backward compatibility with existing runs.
```python
ds = load_dataset("lacg030175/CIC-IoT-2023", "random")
# ds["train"]: 1,073,851 rows
# ds["test"]: 268,463 rows
```
## Subsampling Strategy
The original dataset has 46.7M rows (97.6% attack traffic). To create a manageable benchmark:
- **Benign**: up to 200,000 rows
- **Each attack type**: up to 50,000 rows
- **Total**: 1,342,314 rows (199,988 benign, 1,142,326 attack)
This preserves all 33 attack types while balancing the dataset for binary classification.
## Top-20 RF Features
1. HTTPS
2. Number
3. Time_To_Live
4. Max
5. ack_flag_number
6. Rate
7. IAT
8. ack_count
9. Header_Length
10. Min
11. Variance
12. psh_flag_number
13. Tot sum
14. Std
15. Tot size
16. syn_count
17. AVG
18. rst_flag_number
19. DNS
20. rst_count
## Attack Types (7 classes, 33 sub-types)
| Class | Sub-types |
|---|---|
| Benign | BenignTraffic |
| BruteForce | DictionaryBruteForce |
| DDoS | ACK_Fragmentation, HTTP_Flood, ICMP_Flood/Frag, PSHACK, RSTFINFlood, SlowLoris, SYN_Flood, SynonymousIP, TCP_Flood, UDP_Flood/Frag |
| DoS | HTTP_Flood, SYN_Flood, TCP_Flood, UDP_Flood |
| Mirai | greeth_flood, greip_flood, udpplain |
| Recon | HostDiscovery, OSScan, PingSweep, PortScan, VulnerabilityScan |
| Spoofing | DNS_Spoofing, MITM-ArpSpoofing |
| Web-based | Backdoor_Malware, BrowserHijacking, CommandInjection, SqlInjection, Uploading_Attack, XSS |
## Labels
- **Binary** (`label`): 0 = Benign, 1 = Attack
- **Multi-class** (`Label`): 34 categories (fine-grained attack types)
- **Grouped** (`attack_class`): 8 classes (7 attack groups + Benign)
## Features
39 numeric flow-level features.
## Note on Temporal Split
Unlike UNSW-NB15 and CICIDS2017, CIC-IoT-2023 does not have a natural temporal ordering
(data is organized by attack type, not capture time). Only a random split is provided.
## Citation
```bibtex
@article{neto2023ciciot,
title={CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment},
author={Neto, Euclides Carlos Pinto and others},
journal={Sensors},
volume={23},
number={13},
year={2023},
publisher={MDPI}
}
```
## License
CC BY 4.0 — original dataset by the Canadian Institute for Cybersecurity, University of New Brunswick.
提供机构:
lacg030175



