Datasets used by "Enhancing Website Fingerprinting through Combined Data Augmentation Strategies" (NRA_datasets)

Figshare2026-03-09 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Datasets_used_by_i_Enhancing_Website_Fingerprinting_through_Combined_Data_Augmentation_Strategies_i_/31522807

下载链接

链接失效反馈

官方服务：

资源简介：

This repository contains the datasets used in the paper “Enhancing Website Fingerprinting through Combined Data Augmentation Strategies.”OverviewThe released datasets are organized according to different stages and settings of the experiments:pretrain.zip: used for model pretraining.homepage.zip: used for model fine-tuning and testing on homepage traces.homepage cached.zip: contains homepage traces collected when browser local caches were reused. These traces may therefore be incomplete compared with full page loads.subpages_part_A.zip and subpages_part_B.zip: contain traces collected from multiple subpages under the same domains as the homepage datasets.subpages cached.zip: contains cached subpage traces collected under cache-reuse conditions, which may likewise result in incomplete traffic traces.background.zip: contains unmonitored background traces used for open-world evaluation.The homepage.zip and homepage cached.zip datasets share the same set of monitored websites, whereas pretrain.zip contains a disjoint set of websites used exclusively for pretraining.Dataset Structure1. Homepage and Pretraining DatasetsFor the files homepage.zip, homepage cached.zip, and pretrain.zip, extracting the archive will produce a directory containing multiple files with names in the format:XXX-YYYwhere:XXX denotes the website identifier.YYY indicates the number of samples contained in the file.Each file is stored as a pickle-format Python dictionary.2. Subpage DatasetsFor the files subpages_part_A.zip, subpages_part_B.zip, and subpages cached.zip, extracting the archive will generate multiple subdirectories, where each subdirectory name corresponds to a website identifier.Within each subdirectory, files are also named using the format:XXX-YYYIn this case:XXX denotes the subpage identifier within the same domain.YYY indicates the number of samples contained in that file.These datasets correspond to the same set of domains as the homepage / homepage cached datasets, but contain traces collected from different subpages under each domain.3. Background DatasetsAn additional archive, background.zip, is included as supplementary material for open-world evaluation. It contains 61 pickle-format Python dictionary files, corresponding to approximately 600,000 unmonitored traffic traces.The files follow the naming convention:background-XXX-YYYwhere: XXX and YYY denote the starting and ending indices of the unmonitored traces contained in the file, respectively.These traces serve as background (unmonitored) samples in the open-world setting described in the paper.4. Additional File: `others`Each extracted dataset directory also contains a file named others, which is likewise stored as a pickle-format Python dictionary.This file contains redundant samples that were not used in the experiments reported in the paper.Trace FormatEach pickle dictionary file contains three key-value pairs:directiontimelabelEach key corresponds to a list of equal length, where each element represents a single traffic trace (sample):direction: the sequence of packet directions for the tracetime: the timestamp sequence corresponding to packet transmissionslabel: the label of the website (or subpage) associated with the traceThus, each dataset file represents a collection of traces where each sample is described by its direction sequence, timestamp sequence, and class label.

创建时间：

2026-03-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集