Processed_ADFA_LD dataset
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/853sxvpx79
下载链接
链接失效反馈官方服务:
资源简介:
The Processed_ADFA_LD dataset is a preprocessed version of the ADFA Linux Dataset (ADFA-LD), which was originally developed by the Australian Defence Force Academy (ADFA) for research in host-based intrusion detection systems (HIDS). The dataset focuses on detecting malicious behavior at the operating system level by analyzing system call traces generated by Linux processes.
Original ADFA-LD Dataset Overview
The original ADFA-LD dataset contains system call sequences collected from a Linux environment under two main conditions:
Normal behavior: Legitimate system activities generated by standard user operations.
Attack behavior: System call traces produced during various cyber-attacks, including privilege escalation, denial-of-service, and remote exploits.
Unlike older datasets (e.g., KDD Cup 99), ADFA-LD was designed to reflect modern Linux systems and more realistic attack scenarios.
Processing and Transformation
The Processed_ADFA_LD dataset refers to the cleaned and transformed version of the original dataset to make it suitable for machine learning and deep learning models. Common preprocessing steps include:
System call encoding (e.g., integer mapping or frequency-based encoding)
Sequence normalization or padding to handle variable-length traces
Feature extraction, such as:
n-grams of system calls
statistical features (frequency, entropy, transition probabilities)
Labeling, typically:
0 → Normal behavior
1 → Attack behavior
Train-test splitting for supervised learning experiments
Data Characteristics
Data type: Sequential / time-series data
Features: Encoded system call sequences or derived statistical features
Labels: Binary (Normal vs. Attack) or multiclass (depending on processing)
Domain: Cybersecurity, Host-Based Intrusion Detection
Operating System: Linux
Applications
The Processed_ADFA_LD dataset is widely used for:
Intrusion detection system (IDS) evaluation
Anomaly detection research
Benchmarking machine learning and deep learning models such as:
LSTM / GRU
CNN-based sequence models
Autoencoders
Traditional classifiers (SVM, Random Forest, k-NN)
Advantages
Reflects realistic modern attack behavior
Avoids outdated network-focused features
Well-suited for sequence-based learning models
Limitations
Host-specific (Linux-only)
Limited attack diversity compared to large-scale enterprise datasets
Requires careful preprocessing due to variable-length sequences
创建时间:
2026-01-30



