DReLAB dataset - Deep REinforcement Learning Adversarial Botnet dataset
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/nf22d786tj
下载链接
链接失效反馈官方服务:
资源简介:
We present the first dataset that aims to serve as a benchmark to validate the resilience of botnet detectors against adversarial attacks. The dataset includes realistic adversarial samples automatically generated by leveraging two widely used Deep Reinforcement Learning (DRL) techniques. These adversarial samples are proved to evade state of the art detectors based on both Machine- and Deep-Learning algorithms. The initial corpus of malicious samples consists in network flows belonging to different botnet families presented in three public datasets that contain real enterprise network traffic. We use these datasets to devise detectors capable of achieving state-of-the-art performance. We then train two DRL agents, based on Double Deep Q-Network and Deep Sarsa, to generate realistic adversarial samples: the goal is achieving misclassifications by performing small modifications to the initial malicious samples; these alterations involve the features that can be more realistically altered by an expert attacker, and do not compromise the underlying malicious logic of the original samples. Our dataset provides an important contribution to the cybersecurity research community as it is the first that includes thousands of automatically generated adversarial samples that are able to thwart state of the art classifiers with a high evasion rate. The adversarial samples are grouped by malware variant and provided in CSV file format. Researchers can validate their defensive proposals by testing their detectors against the adversarial samples introduced in the proposed dataset. Moreover, the analysis and the study of those samples can pave a way to a deeper comprehension of adversarial attacks; to the development of new and effective defensive techniques; and can also benefit the works in the explainability of machine learning algorithms.
本研究提出首个基准数据集,用于验证僵尸网络(botnet)检测器对抗对抗性攻击(adversarial attacks)的鲁棒性。该数据集包含依托两种主流深度强化学习(Deep Reinforcement Learning, DRL)技术自动生成的真实感对抗性样本。经验证,这些对抗性样本可绕过基于机器学习与深度学习算法的当前最优检测器。初始恶意样本库源自三个收录真实企业网络流量的公开数据集,其中涵盖了隶属于不同僵尸网络家族的网络流数据。我们依托上述数据集构建了可达到当前最优性能的检测器。随后我们基于双重深度Q网络(Double Deep Q-Network)与深度Sarsa算法,训练了两个DRL智能体(AI Agent)以生成真实感对抗性样本:其目标为通过对初始恶意样本进行微小修改,实现分类误判;此类修改仅涉及专业攻击者可实际篡改的特征,且不会破坏原始样本的底层恶意逻辑。本数据集为网络安全研究社区提供了重要支撑:它是首个收录数千个可高规避率绕过当前最优分类器的自动生成对抗性样本的数据集。对抗性样本按恶意软件变体分组,并以逗号分隔值(Comma-Separated Values, CSV)文件格式提供。研究人员可通过将自身检测器与本数据集提供的对抗性样本进行对抗测试,验证其防御方案。此外,对这些样本的分析与研究,可为深入理解对抗性攻击、开发新型高效防御技术奠定基础,同时也能为机器学习算法可解释性相关研究提供有益参考。
创建时间:
2020-12-09



