five

lacg030175/CIC-IoT-2023

收藏
Hugging Face2026-04-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lacg030175/CIC-IoT-2023
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 size_categories: - 1M<n<10M task_categories: - tabular-classification tags: - network-intrusion-detection - cybersecurity - CIC-IoT-2023 - IoT - IDS - binary-classification pretty_name: CIC-IoT-2023 IoT Intrusion Detection configs: - config_name: random_3way data_files: - split: train path: random_3way/train-* - split: test path: random_3way/test-* - split: validation path: random_3way/validation-* default: true - config_name: random data_files: - split: train path: random/train-* - split: test path: random/test-* --- # CIC-IoT-2023 IoT Intrusion Detection Dataset The [CICIoT2023](https://www.unb.ca/cic/datasets/iotdataset-2023.html) dataset from the Canadian Institute for Cybersecurity, subsampled and preprocessed for machine learning evaluation. ## Configurations ### `random_3way` (default) — 80/10/10 Three-Way Split Stratified random split with fully separated train/test/validation sets: - **Train (80%)**: Model training and architecture search - **Test (10%)**: Threshold calibration (held out from training) - **Validation (10%)**: Final reported metrics (never touched during training or calibration) ```python from datasets import load_dataset ds = load_dataset("lacg030175/CIC-IoT-2023", "random_3way") # ds["train"]: 1,073,851 rows # ds["test"]: 134,231 rows # ds["validation"]: 134,232 rows ``` ### `random` (legacy) — 80/20 Split Original 80/20 split for backward compatibility with existing runs. ```python ds = load_dataset("lacg030175/CIC-IoT-2023", "random") # ds["train"]: 1,073,851 rows # ds["test"]: 268,463 rows ``` ## Subsampling Strategy The original dataset has 46.7M rows (97.6% attack traffic). To create a manageable benchmark: - **Benign**: up to 200,000 rows - **Each attack type**: up to 50,000 rows - **Total**: 1,342,314 rows (199,988 benign, 1,142,326 attack) This preserves all 33 attack types while balancing the dataset for binary classification. ## Top-20 RF Features 1. HTTPS 2. Number 3. Time_To_Live 4. Max 5. ack_flag_number 6. Rate 7. IAT 8. ack_count 9. Header_Length 10. Min 11. Variance 12. psh_flag_number 13. Tot sum 14. Std 15. Tot size 16. syn_count 17. AVG 18. rst_flag_number 19. DNS 20. rst_count ## Attack Types (7 classes, 33 sub-types) | Class | Sub-types | |---|---| | Benign | BenignTraffic | | BruteForce | DictionaryBruteForce | | DDoS | ACK_Fragmentation, HTTP_Flood, ICMP_Flood/Frag, PSHACK, RSTFINFlood, SlowLoris, SYN_Flood, SynonymousIP, TCP_Flood, UDP_Flood/Frag | | DoS | HTTP_Flood, SYN_Flood, TCP_Flood, UDP_Flood | | Mirai | greeth_flood, greip_flood, udpplain | | Recon | HostDiscovery, OSScan, PingSweep, PortScan, VulnerabilityScan | | Spoofing | DNS_Spoofing, MITM-ArpSpoofing | | Web-based | Backdoor_Malware, BrowserHijacking, CommandInjection, SqlInjection, Uploading_Attack, XSS | ## Labels - **Binary** (`label`): 0 = Benign, 1 = Attack - **Multi-class** (`Label`): 34 categories (fine-grained attack types) - **Grouped** (`attack_class`): 8 classes (7 attack groups + Benign) ## Features 39 numeric flow-level features. ## Note on Temporal Split Unlike UNSW-NB15 and CICIDS2017, CIC-IoT-2023 does not have a natural temporal ordering (data is organized by attack type, not capture time). Only a random split is provided. ## Citation ```bibtex @article{neto2023ciciot, title={CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment}, author={Neto, Euclides Carlos Pinto and others}, journal={Sensors}, volume={23}, number={13}, year={2023}, publisher={MDPI} } ``` ## License CC BY 4.0 — original dataset by the Canadian Institute for Cybersecurity, University of New Brunswick.
提供机构:
lacg030175
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作