chandupulluru33/network-traffic-anomaly
收藏Hugging Face2026-03-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/chandupulluru33/network-traffic-anomaly
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# Network Traffic Anomaly Dataset
A processed and curated dataset for **network traffic anomaly detection**, derived from the **CSE‑CIC‑IDS2018** intrusion detection dataset.
This dataset is designed for machine learning and deep learning research on **network security, intrusion detection, and anomaly detection**.
Dataset link: https://huggingface.co/datasets/abmallick/network-traffic-anomaly
> **Important:**
> This dataset is created from the **CSE‑CIC‑IDS2018 dataset** and restructured to be easier to use for modern ML workflows.
---
## 📊 Dataset Overview
- **Name:** Network Traffic Anomaly Dataset
- **Author:** Abhinav Mallick
- **Source Dataset:** CSE‑CIC‑IDS2018
- **Domain:** Network Security / Intrusion Detection
- **Modalities:** Tabular
- **Format:** Parquet
- **License:** MIT
---
## 🧠 Motivation
Raw intrusion detection datasets like CSE‑CIC‑IDS2018 are large, fragmented across multiple files, and often difficult to use directly for ML experiments.
This dataset was created to:
- Simplify access to IDS2018 data
- Provide a clean, ML‑ready format
- Enable rapid experimentation for anomaly detection models
- Support both classical ML and deep learning pipelines
It is suitable for **binary anomaly detection** as well as **multi‑class attack classification**.
---
## 📦 Source Dataset: CSE‑CIC‑IDS2018
The original **CSE‑CIC‑IDS2018** dataset was created by the Canadian Institute for Cybersecurity (CIC) and contains realistic benign and malicious network traffic captured over multiple days.
Key characteristics of the original dataset:
- Realistic enterprise network traffic
- Multiple attack categories (DoS, DDoS, brute force, infiltration, botnet, etc.)
- Flow‑based statistical features extracted using CICFlowMeter
This Hugging Face dataset is a **processed and consolidated version** of that data.
---
## 📋 Features / Columns
Each row represents a **network flow** with extracted statistical features.
Typical feature categories include:
- Flow duration and packet counts
- Forward and backward packet statistics
- Packet length statistics
- Inter‑arrival times
- Header and flag features
- Byte and packet rate metrics
### Key Columns
| Column | Description |
|------|------------|
| `label` | Target label (benign / attack or anomaly class) |
| `attack_type` | Specific attack category (if available) |
| `flow_duration` | Duration of the network flow |
| `total_fwd_packets` | Total forward packets |
| `total_bwd_packets` | Total backward packets |
| `flow_bytes_per_sec` | Bytes transferred per second |
| `flow_packets_per_sec` | Packets per second |
| `packet_length_mean` | Mean packet length |
| `packet_length_std` | Packet length standard deviation |
| `iat_mean` | Mean inter‑arrival time |
| `iat_std` | Inter‑arrival time standard deviation |
| `split` | Dataset split (`train` / `val` / `test`) |
> Exact columns may vary depending on preprocessing and feature selection.
---
## 🧩 Labels
Depending on usage, labels can be interpreted as:
### Binary Classification
- **0:** Benign traffic
- **1:** Anomalous / Malicious traffic
### Multi‑Class Classification
- Benign
- DoS / DDoS
- Brute Force
- Botnet
- Infiltration
- Web attacks
- Other attack types
Users are free to remap labels based on their modeling needs.
---
## 🚀 Quick Start
### Installation
```bash
pip install datasets pandas pyarrow
```
### Load Dataset
```python
from datasets import load_dataset
dataset = load_dataset(
"abmallick/network-traffic-anomaly",
split="train"
)
print(dataset[0])
```
### Convert to Pandas
```python
df = dataset.to_pandas()
df.head()
```
---
## 📈 Example Use Cases
- Network intrusion detection systems (IDS)
- Anomaly detection using autoencoders or isolation forests
- Supervised attack classification models
- Benchmarking ML models on real‑world network traffic
- Security analytics and SOC research
---
## 🧪 Suggested Evaluation Metrics
- Accuracy
- Precision / Recall
- F1‑score
- ROC‑AUC
- False Positive Rate (critical for IDS systems)
---
## 📚 Citation
If you use this dataset, please cite both this dataset and the original source:
### This Dataset
```bibtex
@misc{mallick2025networktraffic,
title={Network Traffic Anomaly Dataset},
author={Mallick, Abhinav},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/datasets/abmallick/network-traffic-anomaly}
}
```
### Original Dataset (CSE‑CIC‑IDS2018)
```bibtex
@dataset{cse_cic_ids2018,
title={CSE-CIC-IDS2018: A Large Scale Dataset for Intrusion Detection Systems},
author={Sharafaldin, Iman and Lashkari, Arash Habibi and Ghorbani, Ali A.},
year={2018},
publisher={Canadian Institute for Cybersecurity}
}
```
---
## 📄 License
This dataset is released under the **MIT License**.
The original CSE‑CIC‑IDS2018 dataset is subject to its own licensing terms.
---
## 🤝 Contributions
Feedback, issues, and improvements are welcome via the Hugging Face dataset page.
提供机构:
chandupulluru33



