lacg030175/CICIDS2017
收藏Hugging Face2026-04-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lacg030175/CICIDS2017
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
size_categories:
- 1M<n<10M
task_categories:
- tabular-classification
tags:
- network-intrusion-detection
- cybersecurity
- CICIDS2017
- IDS
- binary-classification
pretty_name: CICIDS2017 Network Intrusion Detection
configs:
- config_name: temporal_3way
data_files:
- split: train
path: temporal_3way/train-*
- split: test
path: temporal_3way/test-*
- split: validation
path: temporal_3way/validation-*
default: true
- config_name: random_3way
data_files:
- split: train
path: random_3way/train-*
- split: test
path: random_3way/test-*
- split: validation
path: random_3way/validation-*
- config_name: temporal
data_files:
- split: train
path: temporal/train-*
- split: test
path: temporal/test-*
- config_name: standard
data_files:
- split: train
path: temporal/train-*
- split: test
path: temporal/test-*
- config_name: random
data_files:
- split: train
path: random/train-*
- split: test
path: random/test-*
---
# CICIDS2017 Network Intrusion Detection Dataset
The [CICIDS2017](https://www.unb.ca/cic/datasets/ids-2017.html) dataset from the Canadian Institute for Cybersecurity, provided with **temporal and random splits** for fair evaluation.
## Configurations
### `temporal` (default) — Day-Based Temporal Split
> **Note:** `standard` is an alias for `temporal` — both load the same data.
Train on Monday-Thursday, test on Friday. The model must generalize to unseen attack types (DDoS, Botnet, PortScan).
```python
from datasets import load_dataset
ds = load_dataset("lacg030175/CICIDS2017", "temporal") # or "standard"
# ds["train"]: 2,125,158 rows (Mon-Thu)
# ds["test"]: 702,718 rows (Friday)
```
**Train attacks:** 267,771 / 2,125,158 (12.6%)
**Test attacks:** 288,785 / 702,718 (41.1%)
### `random` — Stratified Random Split
80/20 stratified random split from all days combined.
```python
ds = load_dataset("lacg030175/CICIDS2017", "random")
# ds["train"]: 2,262,300 rows
# ds["test"]: 565,576 rows
```
## Top-20 RF Features
1. Bwd Packet Length Std
2. Destination Port
3. Packet Length Std
4. Bwd Packet Length Max
5. Avg Bwd Segment Size
6. Bwd Packet Length Mean
7. Fwd IAT Std
8. Average Packet Size
9. Packet Length Variance
10. Flow IAT Max
11. Packet Length Mean
12. Init_Win_bytes_forward
13. Idle Min
14. Idle Mean
15. Fwd IAT Max
16. Flow IAT Std
17. Flow Packets/s
18. Flow IAT Mean
19. Fwd Header Length
20. Bwd Header Length
## Attack Types
| Day | Attack Types |
|---|---|
| Monday | Benign only |
| Tuesday | FTP-Patator, SSH-Patator |
| Wednesday | DoS Hulk, DoS GoldenEye, DoS Slowhttptest, DoS slowloris, Heartbleed |
| Thursday | Web Attack (Brute Force, XSS, SQL Injection), Infiltration |
| **Friday (test)** | **Bot, DDoS, PortScan** |
## Labels
- **Binary** (`label`): 0 = BENIGN, 1 = Attack
- **Multi-class** (`Label`): 15 categories (BENIGN + 14 attack types)
## Features
78 numeric flow-level features extracted by CICFlowMeter.
## Preprocessing
- Removed rows with NaN/infinity values
- Stripped whitespace from column names and labels
- All features converted to numeric (float64)
- Added binary `label` column (0=BENIGN, 1=Attack)
## Citation
```bibtex
@inproceedings{sharafaldin2018toward,
title={Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization},
author={Sharafaldin, Iman and Lashkari, Arash Habibi and Ghorbani, Ali A},
booktitle={International Conference on Information Systems Security and Privacy},
year={2018}
}
```
## License
CC BY 4.0 — original dataset by the Canadian Institute for Cybersecurity, University of New Brunswick.
提供机构:
lacg030175



