DNS Exfiltration Dataset

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://data.mendeley.com/datasets/c4n7fckkz3

下载链接

链接失效反馈

官方服务：

资源简介：

DNS exfiltration dataset was recorded in a realistic network environment. More than 50 million DNS requests were recorded on one of the ISP's DNS servers. The data in the dataset was anonymised by changing all IP addresses using injective mapping. Features in the dataset are split into single request and aggregate features. Single request or DNS label-based features can be calculated for each DNS request independently using only the textual characteristics of the request. On the other hand, aggregate features are calculated using multiple subsequent request from one client to a particular TLD. This reduces the size of the dataset to about 35 million records. The complete list of features with descriptions can be found in dataset_description.txt file. For all of the features which are based on finding English words in the request we used about 60.000 most commom English words. The list of used words can be found in english_words.txt. The main dataset (dataset.csv) contains regular requests and exfiltrations performed using DNSExfiltrator and Iodine tools. Additional dataset (dataset_modified.csv) contains only exfiltrations executed with modified DNSExfiltrator tool. Waiting times between two consecutive requests in this dataset are randomised and the requests also have lower entropy causing the detection to be much harder. If you use this dataset for your research, please cite: Žiža, K., Tadić, P. & Vuletić, P. DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. (2023). https://doi.org/10.1007/s10207-023-00723-w

本DNS数据泄露数据集采集于真实网络环境中。某互联网服务提供商（Internet Service Provider, ISP）的一台域名系统（Domain Name System, DNS）服务器上共记录了超5000万条DNS请求。本数据集通过单射映射替换所有IP地址完成匿名化处理。数据集中的特征分为单请求特征与聚合特征两类。单请求特征（或称基于DNS标签的特征）可仅通过单条DNS请求的文本特征独立计算得到；聚合特征则基于单个客户端针对特定顶级域名（Top-Level Domain, TLD）的多条连续请求计算得到，该处理将数据集规模缩减至约3500万条记录。完整的特征列表及说明可于dataset_description.txt文件中查阅。对于所有基于请求中英文单词识别的特征，本数据集采用了约6万个最常用英文单词作为参考词库，所用词库可于english_words.txt文件中获取。主数据集（dataset.csv）包含正常DNS请求与使用DNSExfiltrator、Iodine工具执行的DNS数据泄露流量。附加数据集（dataset_modified.csv）仅包含使用修改版DNSExfiltrator工具执行的DNS数据泄露流量。该数据集中连续请求间的等待时间已做随机化处理，且请求本身的熵值更低，这使得检测难度大幅提升。若您在研究中使用本数据集，请引用如下文献：Žiža, K., Tadić, P. & Vuletić, P. DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. (2023). https://doi.org/10.1007/s10207-023-00723-w

创建时间：

2023-07-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集