five

Processed_ADFA_LD dataset

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/853sxvpx79
下载链接
链接失效反馈
官方服务:
资源简介:
The Processed_ADFA_LD dataset is a preprocessed version of the ADFA Linux Dataset (ADFA-LD), which was originally developed by the Australian Defence Force Academy (ADFA) for research in host-based intrusion detection systems (HIDS). The dataset focuses on detecting malicious behavior at the operating system level by analyzing system call traces generated by Linux processes. Original ADFA-LD Dataset Overview The original ADFA-LD dataset contains system call sequences collected from a Linux environment under two main conditions: Normal behavior: Legitimate system activities generated by standard user operations. Attack behavior: System call traces produced during various cyber-attacks, including privilege escalation, denial-of-service, and remote exploits. Unlike older datasets (e.g., KDD Cup 99), ADFA-LD was designed to reflect modern Linux systems and more realistic attack scenarios. Processing and Transformation The Processed_ADFA_LD dataset refers to the cleaned and transformed version of the original dataset to make it suitable for machine learning and deep learning models. Common preprocessing steps include: System call encoding (e.g., integer mapping or frequency-based encoding) Sequence normalization or padding to handle variable-length traces Feature extraction, such as: n-grams of system calls statistical features (frequency, entropy, transition probabilities) Labeling, typically: 0 → Normal behavior 1 → Attack behavior Train-test splitting for supervised learning experiments Data Characteristics Data type: Sequential / time-series data Features: Encoded system call sequences or derived statistical features Labels: Binary (Normal vs. Attack) or multiclass (depending on processing) Domain: Cybersecurity, Host-Based Intrusion Detection Operating System: Linux Applications The Processed_ADFA_LD dataset is widely used for: Intrusion detection system (IDS) evaluation Anomaly detection research Benchmarking machine learning and deep learning models such as: LSTM / GRU CNN-based sequence models Autoencoders Traditional classifiers (SVM, Random Forest, k-NN) Advantages Reflects realistic modern attack behavior Avoids outdated network-focused features Well-suited for sequence-based learning models Limitations Host-specific (Linux-only) Limited attack diversity compared to large-scale enterprise datasets Requires careful preprocessing due to variable-length sequences
创建时间:
2026-01-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作