EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/3882103
下载链接
链接失效反馈官方服务:
资源简介:
EyeFi Dataset
This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.
Clarification/Bug report: Please note that the order of antennas and subcarriers in .h5 files is not written clearly in the README.md file. The order of antennas and subcarriers are as follows for the 90 `csi_real` and `csi_imag` values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. Please see the description below. The newer version of the dataset contains this information in README.md. We are sorry for the inconvenience.
Data Collection Setup
In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.
The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.
To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.
List of Files
Here is a list of files included in the dataset:
|- 1_person
|- 1_person_1.h5
|- 1_person_2.h5
|- 2_people
|- 2_people_1.h5
|- 2_people_2.h5
|- 2_people_3.h5
|- 3_people
|- 3_people_1.h5
|- 3_people_2.h5
|- 3_people_3.h5
|- 5_people
|- 5_people_1.h5
|- 5_people_2.h5
|- 5_people_3.h5
|- 5_people_4.h5
|- 10_people
|- 10_people_1.h5
|- 10_people_2.h5
|- 10_people_3.h5
|- Kitchen
|- 1_person
|- kitchen_1_person_1.h5
|- kitchen_1_person_2.h5
|- kitchen_1_person_3.h5
|- 3_people
|- kitchen_3_people_1.h5
|- training
|- shuffuled_train.h5
|- shuffuled_valid.h5
|- shuffuled_test.h5
View-Dataset-Example.ipynb
README.md
In this dataset, folder `1_person/` , `2_people/` , `3_people/` , `5_people/`, and `10_people/` contains data collected from the lab area whereas `Kitchen/` folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.
The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from `1_person/` folder collected in the lab area (`1_person_1.h5` and `1_person_2.h5`).
Why multiple files in one folder?
Each folder contains multiple files. For example, `1_person` folder has two files: `1_person_1.h5` and `1_person_2.h5`. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like `1_person_1.h5`, `1_person_2.h5`) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.
Special note:
For `1_person_1.h5`, this file is generated by the same person who is holding the phone, and `1_person_2.h5` contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.
Access the data
To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.
Each file is structured as (except the files under *"training/"* folder):
|- csi_imag
|- csi_real
|- nPaths_1
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_2
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_3
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_4
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- num_obj
|- obj_0
|- cam_aoa
|- coordinates
|- obj_1
|- cam_aoa
|- coordinates
...
|- timestamp
The `csi_real` and `csi_imag` are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 `csi_real` and `csi_imag` values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. `nPaths_x` group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with `x` number of multiple paths specified during calculation. Under the `nPath_x` group are `offset_xx` subgroup where `xx` stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:
|Antennas | Offset 1 (rad) | Offset 2 (rad) |
|:-------:|:---------------:|:-------------:|
| 1 & 2 | 1.1899 | -2.0071
| 1 & 3 | 1.3883 | -1.8129
The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the `offset_xx` naming. For example, `offset_12` is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.
The `num_obj` field is used to store the number of human subjects present in the scene. The `obj_0` is always the subject who is holding the phone. In each file, there are `num_obj` of `obj_x`. For each `obj_x1`, we have the `coordinates` reported from the camera and `cam_aoa`, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the `training` folder) . It reflects the way the person carried the phone moved in the space (for `obj_0`) and everyone else walked (for other `obj_y`, where `y` > 0).
The `timestamp` is provided here for time reference for each WiFi packets.
To access the data (Python):
import h5py
data = h5py.File('3_people_3.h5','r')
csi_real = data['csi_real'][()]
csi_imag = data['csi_imag'][()]
cam_aoa = data['obj_0/cam_aoa'][()]
cam_loc = data['obj_0/coordinates'][()]
For file inside `training/` folder:
Files inside training folder has a different data structure:
|- nPath-1
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-2
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-3
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-4
|- aoa
|- csi_imag
|- csi_real
|- spotfi
The group `nPath-x` is the number of multiple path specified during the SpotFi calculation. `aoa` is the camera generated angle of arrival (AoA) (can be considered as ground truth), `csi_image` and `csi_real` is the imaginary and real component of the CSI value. `spotfi` is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across `1_person_1.h5` and `1_person_2.h5`. All the rows under the same `nPath-x` group are aligned (i.e., first row of `aoa` corresponds to the first row of `csi_imag`, `csi_real`, and `spotfi`. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the `1_person_1.h5` and `1_person_2.h5` files.
Citation
If you use the dataset, please cite our paper:
@inproceedings{eyefi2020,
title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching},
author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar},
booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)},
year={2020},
organization={IEEE}
}
Thanks!
References
1. Halperin, Daniel, et al. "Tool release: Gathering 802.11 n traces with channel state information." ACM SIGCOMM Computer Communication Review 41.1 (2011): 53-53.
2. Kotaru, Manikanta, et al. "Spotfi: Decimeter level localization using wifi." Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 2015.
3. Zhang, Dongheng, et al. "Calibrating Phase Offsets for Commodity WiFi." IEEE Systems Journal (2019).
EyeFi 数据集
本数据集为美国宾夕法尼亚州匹兹堡市博世研究与技术中心(Bosch Research and Technology Center, Pittsburgh, PA, USA)EyeFi项目的采集成果。数据集包含人体运动轨迹的WiFi信道状态信息(WiFi Channel State Information, CSI),以及通过摄像头采集的真实位置标注信息。本数据集已应用于发表于2020年IEEE传感器系统分布式计算国际会议(IEEE International Conference on Distributed Computing in Sensor Systems 2020, DCOSS '20)的论文《EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching》。此外,我们还在2020年数据:从采集到分析研讨会(Data: Acquisition to Analysis 2020, DATA '20)上发表了题为《Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones》的数据集专文,详细阐述了数据采集流程,如需了解数据集更多细节可查阅该文。
### 澄清/漏洞说明
请注意,README.md文件中未明确说明.h5文件中天线与子载波的排列顺序。对于90个`csi_real`与`csi_imag`值,其天线与子载波的排列顺序如下:[子载波1-天线1, 子载波1-天线2, 子载波1-天线3, 子载波2-天线1, 子载波2-天线2, 子载波2-天线3, …, 子载波30-天线1, 子载波30-天线2, 子载波30-天线3]。新版本数据集已将该信息补充至README.md中,对此带来的不便我们深表歉意。
### 数据采集设置
本实验采用搭载于英特尔NUC的英特尔5300无线网卡(Intel 5300 WiFi Network Interface Card, NIC),并借助Linux CSI工具[1]提取WiFi CSI数据包。受试者的(x,y)坐标通过安装于天花板的博世Flexidome IP全景7000摄像头采集,并基于该坐标推导到达角(Angle of Arrivals, AoAs)。无线网卡与摄像头的原点坐标一致,但高度不同:摄像头安装高度约为2.85米,WiFi天线安装高度约为1.12米。
数据采集环境包含两个区域:第一区域为11.8m×8.74m的矩形实验室空间;第二区域为不规则形状的厨房区域,其两侧墙面的最大间距分别为19.74m与14.24m。厨房内存在大量障碍物与不同材质的物体,包括金属冰箱、洗碗机等强反射体,会呈现差异化的射频反射特性。
本次数据采集采用谷歌Pixel 2 XL智能手机作为接入点,并将英特尔5300无线网卡连接至该接入点以实现WiFi通信,传输速率约为20~25个数据包每秒。实验室与厨房区域均使用同一套无线网卡与手机完成数据采集。
### 文件列表
数据集包含的文件列表如下:
|- 1_person
|- 1_person_1.h5
|- 1_person_2.h5
|- 2_people
|- 2_people_1.h5
|- 2_people_2.h5
|- 2_people_3.h5
|- 3_people
|- 3_people_1.h5
|- 3_people_2.h5
|- 3_people_3.h5
|- 5_people
|- 5_people_1.h5
|- 5_people_2.h5
|- 5_people_3.h5
|- 5_people_4.h5
|- 10_people
|- 10_people_1.h5
|- 10_people_2.h5
|- 10_people_3.h5
|- Kitchen
|- 1_person
|- kitchen_1_person_1.h5
|- kitchen_1_person_2.h5
|- kitchen_1_person_3.h5
|- 3_people
|- kitchen_3_people_1.h5
|- training
|- shuffuled_train.h5
|- shuffuled_valid.h5
|- shuffuled_test.h5
View-Dataset-Example.ipynb
README.md
本数据集中,`1_person/`、`2_people/`、`3_people/`、`5_people/`与`10_people/`文件夹包含从实验室区域采集的数据,而`Kitchen/`文件夹包含从厨房区域采集的数据。如需了解单个文件的结构,请参见下文“访问数据”章节。
`training/`文件夹包含用于训练论文中提及的神经网络的训练数据集,该数据集通过混合实验室区域`1_person/`文件夹下的所有数据(即`1_person_1.h5`与`1_person_2.h5`)并打乱顺序生成。
### 为什么一个文件夹下存在多个文件?
每个文件夹下包含多个文件,例如`1_person`文件夹包含`1_person_1.h5`与`1_person_2.h5`。同一文件夹下的文件,其场景中同时存在的人体受试者数量一致,但握持手机的受试者可能不同,且数据可能采集于不同日期,或因系统稳定性问题重启过采集设备。因此我们提供多个文件(如`1_person_1.h5`、`1_person_2.h5`)以区分不同的握持手机的受试者,以及因系统重启引入的不同相位偏移(详见下文)。
### 特殊说明
`1_person_1.h5`由同一握持手机的受试者生成,而`1_person_2.h5`由不同的受试者握持手机生成,但场景中始终仅有一名受试者在场。两个文件均采集于不同日期。
### 访问数据
如需读取数据集,需使用HDF5库。官方网站提供了免费的HDF5查看器:https://www.hdfgroup.org/downloads/hdfview/。我们还提供了示例Python代码`View-Dataset-Example.ipynb`,用于演示如何访问数据集。
除`training/`文件夹下的文件外,其余每个文件的结构如下:
|- csi_imag
|- csi_real
|- nPaths_1
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_2
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_3
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_4
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- num_obj
|- obj_0
|- cam_aoa
|- coordinates
|- obj_1
|- cam_aoa
|- coordinates
...
|- timestamp
其中`csi_real`与`csi_imag`分别为CSI测量值的实部与虚部。90个`csi_real`与`csi_imag`值的天线与子载波排列顺序为:[子载波1-天线1, 子载波1-天线2, 子载波1-天线3, 子载波2-天线1, 子载波2-天线2, 子载波2-天线3, …, 子载波30-天线1, 子载波30-天线2, 子载波30-天线3]。`nPaths_x`组为通过SpotFi[2]算法计算得到的WiFi到达角(AoA),其中`x`为计算时指定的多径数量。`nPath_x`组下包含`offset_xx`子组,`xx`代表SpotFi计算过程中用于校正相位偏移的偏移组合。我们实测得到的偏移值如下:
|天线组合 | 偏移1(弧度) | 偏移2(弧度) |
|:-------:|:---------------:|:-------------:|
| 1 & 2 | 1.1899 | -2.0071
| 1 & 3 | 1.3883 | -1.8129
该实测结果基于文献[3]的工作,作者指出两天线之间存在两种可能的相位偏移,我们通过多次重启设备完成了该偏移的测量。`offset_xx`的命名即基于该偏移组合,例如`offset_12`代表在SpotFi计算中,使用了天线1与2之间的偏移1,以及天线1与3之间的偏移2。
`num_obj`字段用于存储场景中存在的人体受试者数量。`obj_0`始终为握持手机的受试者。每个文件中包含`num_obj`个`obj_x`对象。对于每个`obj_x`,其包含从摄像头获取的`coordinates`坐标,以及基于该坐标估算得到的`cam_aoa`(摄像头-derived到达角)。除`training`文件夹下的文件外,此处列出的(x,y)坐标与AoA均按时间顺序排列,可反映受试者(`obj_0`为握持手机的受试者,其余`obj_y`(y>0)为其他在场人员)在空间中的移动轨迹。
`timestamp`字段用于为每个WiFi数据包提供时间参考。
Python访问数据示例:
python
import h5py
data = h5py.File('3_people_3.h5','r')
csi_real = data['csi_real'][()]
csi_imag = data['csi_imag'][()]
cam_aoa = data['obj_0/cam_aoa'][()]
cam_loc = data['obj_0/coordinates'][()]
对于`training/`文件夹下的文件:
`training/`文件夹下的文件具有不同的数据结构:
|- nPath-1
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-2
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-3
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-4
|- aoa
|- csi_imag
|- csi_real
|- spotfi
其中`nPath-x`组为SpotFi计算时指定的多径数量。`aoa`为摄像头生成的到达角(可视为真实标注),`csi_imag`与`csi_real`分别为CSI值的虚部与实部,`spotfi`为SpotFi算法计算得到的AoA值。SpotFi值的选取基于在`1_person_1.h5`与`1_person_2.h5`中得到的最低中位数与平均误差。同一`nPath-x`组下的所有行均已对齐(即`aoa`的第一行对应`csi_imag`、`csi_real`与`spotfi`的第一行)。该文件夹下的文件未记录时间戳,且数据顺序为随机打乱的,因为其源自`1_person_1.h5`与`1_person_2.h5`的混洗数据。
### 引用说明
如您使用本数据集,请引用我们的论文:
bibtex
@inproceedings{eyefi2020,
title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching},
author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar},
booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)},
year={2020},
organization={IEEE}
}
致谢!
### 参考文献
1. Halperin, Daniel, et al. "Tool release: Gathering 802.11 n traces with channel state information." ACM SIGCOMM Computer Communication Review 41.1 (2011): 53-53.
2. Kotaru, Manikanta, et al. "Spotfi: Decimeter level localization using wifi." Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 2015.
3. Zhang, Dongheng, et al. "Calibrating Phase Offsets for Commodity WiFi." IEEE Systems Journal (2019).
创建时间:
2022-12-04



