Encrypted Traffic Feature Dataset for Machine Learning and Deep Learning based Encrypted Traffic Analysis
收藏Mendeley Data2024-01-31 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/xw7r4tt54g
下载链接
链接失效反馈官方服务:
资源简介:
This traffic dataset contains a balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection and analysis. The dataset is a secondary csv feature data that is composed of six public traffic datasets. Our dataset is curated based on two criteria: The first criterion is to combine widely considered public datasets which contain enough encrypted malicious or encrypted legitimate traffic in existing works, such as Malware Capture Facility Project datasets. The second criterion is to ensure the final dataset balance of encrypted malicious and legitimate network traffic. Based on the criteria, 6 public datasets are selected. After data pre-processing, details of each selected public dataset and the size of different encrypted traffic are shown in the “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, the traffic size of each malicious traffic type, and the total traffic size of the composed dataset. From the table, we are able to observe that encrypted malicious and legitimate traffic equally contributes to approximately 50% of the final composed dataset. The datasets now made available were prepared to aim at encrypted malicious traffic detection. Since the dataset is used for machine learning or deep learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4. Such datasets can be used for machine learning or deep learning model training and testing based on selected features or after processing further data pre-processing.
本流量数据集包含均衡规模的加密恶意流量与合法流量,可用于加密恶意流量的检测与分析。本数据集为经过二次处理的逗号分隔值(Comma-Separated Values,CSV)特征数据,由6个公开流量数据集整合而成。
我们基于两项准则构建本数据集:其一,整合现有研究中被广泛采用的、包含充足加密恶意或加密合法流量的公开数据集,例如恶意软件捕获设施项目(Malware Capture Facility Project)数据集;其二,确保最终数据集中加密恶意流量与合法网络流量的规模保持均衡。基于上述准则,我们共选取6个公开数据集。
经数据预处理后,各入选公开数据集的细节以及不同加密流量的规模详见《数据集统计分析文档》。该文档汇总了我们从各入选公开数据集中选取的恶意流量与合法流量规模、各类恶意流量的流量规模,以及整合后数据集的总流量规模。从该文档的表格中可观察到,加密恶意流量与合法流量在最终整合数据集中的占比均约为50%,实现了样本均衡。
本数据集现已对外发布,旨在服务于加密恶意流量检测任务。由于该数据集用于机器学习或深度学习模型训练,我们还提供了训练集与测试集的样本。训练集与测试集按1:4的比例划分。
本数据集可基于已选取的特征,或经进一步数据预处理后,用于机器学习或深度学习模型的训练与测试。
创建时间:
2024-01-31



