Datasets used by "Enhancing Website Fingerprinting through Combined Data Augmentation Strategies" (NRA_datasets)
收藏Figshare2026-03-09 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Datasets_used_by_i_Enhancing_Website_Fingerprinting_through_Combined_Data_Augmentation_Strategies_i_/31522807
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the datasets used in the paper “Enhancing Website Fingerprinting through Combined Data Augmentation Strategies.”OverviewThe released datasets are organized according to different stages and settings of the experiments:pretrain.zip: used for model pretraining.homepage.zip: used for model fine-tuning and testing on homepage traces.homepage cached.zip: contains homepage traces collected when browser local caches were reused. These traces may therefore be incomplete compared with full page loads.subpages_part_A.zip and subpages_part_B.zip: contain traces collected from multiple subpages under the same domains as the homepage datasets.subpages cached.zip: contains cached subpage traces collected under cache-reuse conditions, which may likewise result in incomplete traffic traces.background.zip: contains unmonitored background traces used for open-world evaluation.The homepage.zip and homepage cached.zip datasets share the same set of monitored websites, whereas pretrain.zip contains a disjoint set of websites used exclusively for pretraining.Dataset Structure1. Homepage and Pretraining DatasetsFor the files homepage.zip, homepage cached.zip, and pretrain.zip, extracting the archive will produce a directory containing multiple files with names in the format:XXX-YYYwhere:XXX denotes the website identifier.YYY indicates the number of samples contained in the file.Each file is stored as a pickle-format Python dictionary.2. Subpage DatasetsFor the files subpages_part_A.zip, subpages_part_B.zip, and subpages cached.zip, extracting the archive will generate multiple subdirectories, where each subdirectory name corresponds to a website identifier.Within each subdirectory, files are also named using the format:XXX-YYYIn this case:XXX denotes the subpage identifier within the same domain.YYY indicates the number of samples contained in that file.These datasets correspond to the same set of domains as the homepage / homepage cached datasets, but contain traces collected from different subpages under each domain.3. Background DatasetsAn additional archive, background.zip, is included as supplementary material for open-world evaluation. It contains 61 pickle-format Python dictionary files, corresponding to approximately 600,000 unmonitored traffic traces.The files follow the naming convention:background-XXX-YYYwhere: XXX and YYY denote the starting and ending indices of the unmonitored traces contained in the file, respectively.These traces serve as background (unmonitored) samples in the open-world setting described in the paper.4. Additional File: `others`Each extracted dataset directory also contains a file named others, which is likewise stored as a pickle-format Python dictionary.This file contains redundant samples that were not used in the experiments reported in the paper.Trace FormatEach pickle dictionary file contains three key-value pairs:directiontimelabelEach key corresponds to a list of equal length, where each element represents a single traffic trace (sample):direction: the sequence of packet directions for the tracetime: the timestamp sequence corresponding to packet transmissionslabel: the label of the website (or subpage) associated with the traceThus, each dataset file represents a collection of traces where each sample is described by its direction sequence, timestamp sequence, and class label.
创建时间:
2026-03-09



