A dataset of labelled device Wi-Fi probe requests for MAC address de-randomization - 2021
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/j64btzdsdy
下载链接
链接失效反馈官方服务:
资源简介:
A Wi-Fi client device can perform a passive scan to detect wireless networks within its radio range, looking for Beacon Frames, i.e., packets issued by the Access Points (APs) to signal their presence. Alternatively, the client can speed up this process by actively searching for a network connection; to this, it transmits Probe Requests messages periodically, which are management frames of the IEEE standard 802.11. The process by which these messages are captured is called sniffing. Sniffing can be performed via a Wi-Fi interface set in monitor mode and tuned to the same channel (or an adjacent channel) where the transmission occurred. Management messages are not encrypted, so they can be used to implement device counting algorithms based on MAC addresses analysis. However, major operating system producers, in order to avoid tracking the device owners, developed functionalities for MAC address randomisation. Devices that change their physical address periodically and randomly, challenge counting algorithms that must then perform an additional address de-randomization, i.e., cluster the probe requests according to the source device by analysing appropriate message features. To the best of our knowledge, our dataset is the only one available with labelled (indication of the emitting source) Wi-Fi probe requests. To obtain the labels, the data has been collected either in an isolated environment (the anechoic chamber of our department) or in a "noisy" environment (a chamber without particular shielding, but with no other sources of probe requests within a radius of two meters). The first type of data is published after removing only packets originating from the Raspberry Pi embedded interface MAC address; the second type of data has been additionally filtered to simulate the anechoic chamber environment. Each capture file has a duration of 20 minutes and considers three non-overlapping channels (channels 1, 6, and 11) simultaneously. The dataset contains Probe Requests from 22 different devices, each observed separately in 6 different modes, including settings based on display status, Wi-Fi connection, and power saving. We collected 315 non-empty files in total. Captures that were empty after filtering were removed. The device used for the capture is a Raspberry Pi with three Wi-Fi dongle interfaces installed, each used to collect data from a channel. The main characteristic of the dataset is its subdivision by device, which enables a more accurate behavior analysis of individual devices in different modes. Moreover, it is possible to use the labelled data to train Machine Learning algorithms or to verify the correct functioning of algorithms that have as their objective the counting of devices through probe request analysis in the presence of random MAC addresses.
Note: In version 2, all device directories have been moved inside the folder "Individual devices" and renamed. Moreover, we added the link to a new database published in 2024.
创建时间:
2025-07-24



