A dataset of labelled device Wi-Fi probe requests for MAC address de-randomization - 2021
收藏Mendeley Data2024-03-27 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/j64btzdsdy
下载链接
链接失效反馈官方服务:
资源简介:
A Wi-Fi client device can perform a passive scan to detect wireless networks within its radio range, looking for Beacon Frames, i.e. packets issued by the Access Points (APs) to signal their presence. Alternatively, the client can speed up this process by actively searching for a network connection; to this, it transmits Probe Requests messages periodically, which are management frames of the IEEE standard 802.11. The process by which these messages are captured is called sniffing and can be performed via a Wi-Fi interface set in monitor mode and tuned to the same channel (or an adjacent channel) where the transmission happened. Both these kinds of messages are not encrypted, for this reason they can be used to implement device counting algorithms based on MAC addresses analysis; however, major operating systems producers, in order to avoid the tracking of the device owners, developed functionalities for MAC address randomisation. Devices that change their physical address periodically and randomly, challenge counting algorithms that must then perform an additional address de-randomization, i.e., cluster the probes requests according to the source device by analysing appropriate message features. However, the solution to this is not straightforward and further research is needed to achieve successful de-randomized traces. To the best of our knowledge, our dataset is the only one available with labelled (indication of the emitting source) Wi-Fi probe requests. To obtain the labels, the data has been collected either in an isolated environment (the anechoic chamber of our department) or in a "noisy" environment (a chamber without particular shielding, but in any case without other sources of probe requests in the radius of two meters). The first type of data is published after the removal of packets with Raspberry embedded interface MAC address; the second type of data has been filtered in order to simulate the anechoic chamber environment. Each capture file has a duration of 20 minutes and considers three non-overlapping channels (1, 6 and 11) contemporaneously. The present dataset contains Probe Requests of 22 different devices each observed separately in 6 different modes, including settings based on display status, Wi-Fi connection and power saving. We collected 315 non-empty files in total, captures that were completely empty after filtering were removed. The device used for the capture is a Raspberry Pi with three Wi-Fi dongle interfaces installed, each used to collect data from a channel. The main characteristic of the dataset is the subdivision by device, which allows for a more accurate behavior analysis of individual devices in different modes. Moreover, it is possible to use the labelled data to train Machine Learning algorithms or to verify the correct functioning of algorithms that have as their objective the counting of devices through probe request analysis in the presence of random MAC addresses.
创建时间:
2024-01-23



