five

chandupulluru33/network-traffic-anomaly

收藏
Hugging Face2026-03-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/chandupulluru33/network-traffic-anomaly
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # Network Traffic Anomaly Dataset A processed and curated dataset for **network traffic anomaly detection**, derived from the **CSE‑CIC‑IDS2018** intrusion detection dataset. This dataset is designed for machine learning and deep learning research on **network security, intrusion detection, and anomaly detection**. Dataset link: https://huggingface.co/datasets/abmallick/network-traffic-anomaly > **Important:** > This dataset is created from the **CSE‑CIC‑IDS2018 dataset** and restructured to be easier to use for modern ML workflows. --- ## 📊 Dataset Overview - **Name:** Network Traffic Anomaly Dataset - **Author:** Abhinav Mallick - **Source Dataset:** CSE‑CIC‑IDS2018 - **Domain:** Network Security / Intrusion Detection - **Modalities:** Tabular - **Format:** Parquet - **License:** MIT --- ## 🧠 Motivation Raw intrusion detection datasets like CSE‑CIC‑IDS2018 are large, fragmented across multiple files, and often difficult to use directly for ML experiments. This dataset was created to: - Simplify access to IDS2018 data - Provide a clean, ML‑ready format - Enable rapid experimentation for anomaly detection models - Support both classical ML and deep learning pipelines It is suitable for **binary anomaly detection** as well as **multi‑class attack classification**. --- ## 📦 Source Dataset: CSE‑CIC‑IDS2018 The original **CSE‑CIC‑IDS2018** dataset was created by the Canadian Institute for Cybersecurity (CIC) and contains realistic benign and malicious network traffic captured over multiple days. Key characteristics of the original dataset: - Realistic enterprise network traffic - Multiple attack categories (DoS, DDoS, brute force, infiltration, botnet, etc.) - Flow‑based statistical features extracted using CICFlowMeter This Hugging Face dataset is a **processed and consolidated version** of that data. --- ## 📋 Features / Columns Each row represents a **network flow** with extracted statistical features. Typical feature categories include: - Flow duration and packet counts - Forward and backward packet statistics - Packet length statistics - Inter‑arrival times - Header and flag features - Byte and packet rate metrics ### Key Columns | Column | Description | |------|------------| | `label` | Target label (benign / attack or anomaly class) | | `attack_type` | Specific attack category (if available) | | `flow_duration` | Duration of the network flow | | `total_fwd_packets` | Total forward packets | | `total_bwd_packets` | Total backward packets | | `flow_bytes_per_sec` | Bytes transferred per second | | `flow_packets_per_sec` | Packets per second | | `packet_length_mean` | Mean packet length | | `packet_length_std` | Packet length standard deviation | | `iat_mean` | Mean inter‑arrival time | | `iat_std` | Inter‑arrival time standard deviation | | `split` | Dataset split (`train` / `val` / `test`) | > Exact columns may vary depending on preprocessing and feature selection. --- ## 🧩 Labels Depending on usage, labels can be interpreted as: ### Binary Classification - **0:** Benign traffic - **1:** Anomalous / Malicious traffic ### Multi‑Class Classification - Benign - DoS / DDoS - Brute Force - Botnet - Infiltration - Web attacks - Other attack types Users are free to remap labels based on their modeling needs. --- ## 🚀 Quick Start ### Installation ```bash pip install datasets pandas pyarrow ``` ### Load Dataset ```python from datasets import load_dataset dataset = load_dataset( "abmallick/network-traffic-anomaly", split="train" ) print(dataset[0]) ``` ### Convert to Pandas ```python df = dataset.to_pandas() df.head() ``` --- ## 📈 Example Use Cases - Network intrusion detection systems (IDS) - Anomaly detection using autoencoders or isolation forests - Supervised attack classification models - Benchmarking ML models on real‑world network traffic - Security analytics and SOC research --- ## 🧪 Suggested Evaluation Metrics - Accuracy - Precision / Recall - F1‑score - ROC‑AUC - False Positive Rate (critical for IDS systems) --- ## 📚 Citation If you use this dataset, please cite both this dataset and the original source: ### This Dataset ```bibtex @misc{mallick2025networktraffic, title={Network Traffic Anomaly Dataset}, author={Mallick, Abhinav}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/datasets/abmallick/network-traffic-anomaly} } ``` ### Original Dataset (CSE‑CIC‑IDS2018) ```bibtex @dataset{cse_cic_ids2018, title={CSE-CIC-IDS2018: A Large Scale Dataset for Intrusion Detection Systems}, author={Sharafaldin, Iman and Lashkari, Arash Habibi and Ghorbani, Ali A.}, year={2018}, publisher={Canadian Institute for Cybersecurity} } ``` --- ## 📄 License This dataset is released under the **MIT License**. The original CSE‑CIC‑IDS2018 dataset is subject to its own licensing terms. --- ## 🤝 Contributions Feedback, issues, and improvements are welcome via the Hugging Face dataset page.
提供机构:
chandupulluru33
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作