Processed_ADFA_LD dataset

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/853sxvpx79

下载链接

链接失效反馈

官方服务：

资源简介：

The Processed_ADFA_LD dataset is a preprocessed version of the ADFA Linux Dataset (ADFA-LD), which was originally developed by the Australian Defence Force Academy (ADFA) for research in host-based intrusion detection systems (HIDS). The dataset focuses on detecting malicious behavior at the operating system level by analyzing system call traces generated by Linux processes. Original ADFA-LD Dataset Overview The original ADFA-LD dataset contains system call sequences collected from a Linux environment under two main conditions: Normal behavior: Legitimate system activities generated by standard user operations. Attack behavior: System call traces produced during various cyber-attacks, including privilege escalation, denial-of-service, and remote exploits. Unlike older datasets (e.g., KDD Cup 99), ADFA-LD was designed to reflect modern Linux systems and more realistic attack scenarios. Processing and Transformation The Processed_ADFA_LD dataset refers to the cleaned and transformed version of the original dataset to make it suitable for machine learning and deep learning models. Common preprocessing steps include: System call encoding (e.g., integer mapping or frequency-based encoding) Sequence normalization or padding to handle variable-length traces Feature extraction, such as: n-grams of system calls statistical features (frequency, entropy, transition probabilities) Labeling, typically: 0 → Normal behavior 1 → Attack behavior Train-test splitting for supervised learning experiments Data Characteristics Data type: Sequential / time-series data Features: Encoded system call sequences or derived statistical features Labels: Binary (Normal vs. Attack) or multiclass (depending on processing) Domain: Cybersecurity, Host-Based Intrusion Detection Operating System: Linux Applications The Processed_ADFA_LD dataset is widely used for: Intrusion detection system (IDS) evaluation Anomaly detection research Benchmarking machine learning and deep learning models such as: LSTM / GRU CNN-based sequence models Autoencoders Traditional classifiers (SVM, Random Forest, k-NN) Advantages Reflects realistic modern attack behavior Avoids outdated network-focused features Well-suited for sequence-based learning models Limitations Host-specific (Linux-only) Limited attack diversity compared to large-scale enterprise datasets Requires careful preprocessing due to variable-length sequences

创建时间：

2026-01-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集