five

gadgadgad/OfficeLocalization

收藏
Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/gadgadgad/OfficeLocalization
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 task_categories: - time-series-forecasting tags: - wifi-sensing - csi - indoor-localization - occupancy-detection - esp32 - smart-home - indoor-sensing pretty_name: "Office Localization — WiFi CSI Indoor Localization (Office)" size_categories: - 1M<n<10M --- # Office Localization — WiFi CSI Indoor Localization (Office Environment) ## Dataset Description **Office Localization** is a WiFi Channel State Information (CSI) dataset for zone-level indoor localization and occupancy region detection, collected in an office environment using two ESP32-C6 microcontrollers operating as commodity 802.11n access points. It contains **4 region/occupancy classes** recorded across **2 temporal sessions per class**, totaling approximately **1.6 million CSI packets** and **~124 minutes** of continuous recording. This dataset is part of the research paper: > **WiFi Sensing-Based Human Activity Recognition For Smart Home Applications Using Commodity Access Points** > Gad Gad, Iqra Batool, Mostafa M. Fouda, Shikhar Verma, Zubair Md Fadlullah > IEEE, 2026 📄 [Paper](https://gadm21.github.io/WifiSensingESP32HAR/IEEE_2026__wifi_sensing_.pdf) · ⚡ [GitHub](https://github.com/gadm21/WifiSensingESP32HAR) · 🌐 [Project Page](https://gadm21.github.io/WifiSensingESP32HAR/) ## Region / Occupancy Classes | Label | Description | |-------|-------------| | `empty` | No human present in the sensing area | | `one` | Person present in Zone 1 of the office | | `two` | Person present in Zone 2 of the office | | `five` | Person present in Zone 5 of the office | The zone labels correspond to distinct spatial regions within the office. The task is to determine **where** a person is located (or if the room is empty) based solely on how their body perturbs the WiFi channel between the transmitter and receiver. ## Collection Setup | Parameter | Value | |-----------|-------| | **Hardware** | 2 × ESP32-C6 (TX: AP mode, RX: STA mode) | | **WiFi Standard** | 802.11n, 20 MHz bandwidth, HT-LTF | | **Subcarriers** | 64 total (52 LLTF data subcarriers extracted) | | **Packet Rate** | ~200 packets/sec (irregular, resampled to 150 Hz) | | **Transport** | UART serial @ 115200 baud | | **Environment** | Office room with desks, chairs, and typical office furniture | | **TX–RX Distance** | ~3 meters, line-of-sight | | **Recorded** | October 2025 | ## Data Organization | File | Label | Split | Approx. Packets | |------|-------|-------|-----------------| | `empty_1.csv` | empty | Train | ~210K | | `empty_2.csv` | empty | Test | ~210K | | `five_1.csv` | five | Train | ~150K | | `five_2.csv` | five | Test | ~150K | | `one_1.csv` | one | Train | ~150K | | `one_2.csv` | one | Test | ~150K | | `two_1.csv` | two | Train | ~150K | | `two_2.csv` | two | Test | ~150K | **Split strategy**: File-based temporal holdout. The first recording session per label is used for training and the second for testing. This ensures the model generalizes to temporally distinct data collected at a different time. ## CSV Format Each CSV file contains one row per received CSI packet with the following columns: | Column | Description | |--------|-------------| | `type` | Packet type (always `CSI_DATA`) | | `seq` | Sequence number / local timestamp | | `mac` | Transmitter MAC address | | `rssi` | Received Signal Strength Indicator (dBm) | | `rate` | PHY rate index | | `noise_floor` | Noise floor estimate (dBm) | | `fft_gain` | FFT gain applied by hardware | | `agc_gain` | Automatic Gain Control value | | `channel` | WiFi channel number | | `local_timestamp` | ESP32 local timestamp (µs) | | `sig_len` | Signal length | | `rx_state` | Receiver state | | `len` | CSI data length (128 = 64 subcarriers × 2 components) | | `first_word` | Header word | | `data` | Raw CSI data as `[I₀, Q₀, I₁, Q₁, ..., I₆₃, Q₆₃]` — 128 signed integers representing in-phase and quadrature components for 64 subcarriers | ## Recommended Preprocessing Pipeline 1. **Load** CSV and parse the `data` column into complex I/Q arrays 2. **Select** 52 LLTF subcarriers (discard guard/null subcarriers) 3. **Resample** to a uniform 150 Hz sample rate (original rate is irregular ~100–200 Hz) 4. **Feature extraction**: Rolling variance with window W ∈ {20, 200, 2000} (recommended: W=200) 5. **Windowing**: Segment into fixed-length windows (e.g., 100 samples = 0.67s at 150 Hz) ## Benchmark Results Best results from the paper using rolling-variance features (W=200): | Classifier | Accuracy | |-----------|----------| | Random Forest | 89.1% | | XGBoost | 88.6% | | Conv1D | 95.7% | | CNN-LSTM | 96.7% | | PCA + KNN | 84.1% | Office Localization achieves excellent results with deep learning models, demonstrating that commodity WiFi CSI can perform zone-level indoor localization without any dedicated infrastructure — just two off-the-shelf ESP32-C6 boards. ## Use Cases - **Smart building management**: Automatically determine which zones are occupied - **Energy optimization**: Zone-aware HVAC and lighting control - **Elderly care**: Non-intrusive monitoring of movement between rooms/zones - **Security**: Detect unauthorized presence in restricted zones ## License This dataset is released under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.

语言: - 英语 许可证:CC BY 4.0 任务类别: - 时间序列预测 标签: - WiFi感知 - 信道状态信息(Channel State Information, CSI) - 室内定位 - 占用检测 - ESP32 - 智能家居 - 室内感知 展示名称:"办公场景定位——WiFi CSI室内定位(办公环境)" 数据规模: - 100万 < 样本量 < 1000万 # 办公场景定位——WiFi CSI室内定位(办公环境) ## 数据集描述 **办公定位数据集**是一款面向区域级室内定位与占用区域检测的WiFi信道状态信息(Channel State Information, CSI)数据集,采集于办公环境,采用两台ESP32-C6微控制器作为商用802.11n接入点。该数据集包含**4个区域/占用类别**,每个类别对应**2次时序采集会话**,总计约**160万条CSI数据包**与**约124分钟**的连续采集数据。 本数据集关联研究论文: > **基于商用接入点的WiFi感知人体活动识别及其智能家居应用** > Gad Gad、Iqra Batool、Mostafa M. Fouda、Shikhar Verma、Zubair Md Fadlullah > IEEE,2026 📄 [论文](https://gadm21.github.io/WifiSensingESP32HAR/IEEE_2026__wifi_sensing_.pdf) · ⚡ [GitHub仓库](https://github.com/gadm21/WifiSensingESP32HAR) · 🌐 [项目主页](https://gadm21.github.io/WifiSensingESP32HAR/) ## 区域/占用类别 | 标签 | 描述 | |-------|-------------| | `empty` | 感知区域内无人员 | | `one` | 办公室1区有人员 | | `two` | 办公室2区有人员 | | `five` | 办公室5区有人员 | 区域标签对应办公区内的不同独立空间区域。本任务的目标是仅通过人体对收发端间WiFi信道的扰动,判断人员所在位置(或房间是否为空)。 ## 采集设置 | 参数 | 取值 | |-----------|-------| | **硬件** | 2 × ESP32-C6(发射端:接入点模式,接收端:站点模式) | | **WiFi标准** | 802.11n,20MHz带宽,HT-LTF | | **子载波** | 总计64个(提取52个LLTF数据子载波) | | **数据包速率** | 约200包/秒(速率不规则,已重采样至150Hz) | | **传输方式** | UART串口,波特率115200 | | **采集环境** | 配备办公桌、座椅及典型办公家具的办公室 | | **收发间距** | 约3米,视距传输 | | **采集时间** | 2025年10月 | ## 数据组织 | 文件 | 标签 | 划分 | 近似数据包数 | |------|-------|-------|-----------------| | `empty_1.csv` | empty | 训练集 | ~21万 | | `empty_2.csv` | empty | 测试集 | ~21万 | | `five_1.csv` | five | 训练集 | ~15万 | | `five_2.csv` | five | 测试集 | ~15万 | | `one_1.csv` | one | 训练集 | ~15万 | | `one_2.csv` | one | 测试集 | ~15万 | | `two_1.csv` | two | 训练集 | ~15万 | | `two_2.csv` | two | 测试集 | ~15万 | **划分策略**:基于文件的时序留存法。每个类别的第一次采集会话用于训练,第二次用于测试,以此确保模型可泛化至不同时段采集的时序差异数据。 ## CSV格式 每个CSV文件的每行对应一条接收的CSI数据包,各列说明如下: | 列名 | 描述 | |--------|-------------| | `type` | 数据包类型(始终为`CSI_DATA`) | | `seq` | 序列号/本地时间戳 | | `mac` | 发射端MAC地址 | | `rssi` | 接收信号强度指示(dBm) | | `rate` | PHY速率索引 | | `noise_floor` | 噪声基底估计值(dBm) | | `fft_gain` | 硬件应用的FFT增益 | | `agc_gain` | 自动增益控制值 | | `channel` | WiFi信道编号 | | `local_timestamp` | ESP32本地时间戳(微秒) | | `sig_len` | 信号长度 | | `rx_state` | 接收器状态 | | `len` | CSI数据长度(128 = 64个子载波 × 2个分量) | | `first_word` | 头部字 | | `data` | 原始CSI数据,格式为`[I₀, Q₀, I₁, Q₁, ..., I₆₃, Q₆₃]`——128个有符号整数,代表64个子载波的同相(I)与正交(Q)分量 | ## 推荐预处理流程 1. **加载**CSV文件并解析`data`列为复数I/Q数组 2. **选取**52个LLTF子载波(丢弃保护/空载波) 3. **重采样**至统一的150Hz采样率(原始速率不规则,约100~200Hz) 4. **特征提取**:采用窗口大小W∈{20, 200, 2000}的滑动方差(推荐W=200) 5. **窗口划分**:分割为固定长度窗口(例如,150Hz下100个采样点对应0.67秒) ## 基准测试结果 论文中采用滑动方差特征(W=200)的最优分类结果如下: | 分类器 | 准确率 | |-----------|----------| | 随机森林 | 89.1% | | XGBoost | 88.6% | | 一维卷积神经网络(Conv1D) | 95.7% | | CNN-LSTM混合模型 | 96.7% | | PCA+KNN | 84.1% | 办公定位数据集在深度学习模型上取得了优异的效果,证明了商用WiFi CSI可在无需专用基础设施的情况下实现区域级室内定位——仅需两台现成的ESP32-C6开发板。 ## 应用场景 - **智能楼宇管理**:自动识别各区域的占用状态 - **能源优化**:基于区域感知的暖通空调与照明控制 - **养老看护**:非侵入式监测人员在房间/区域间的移动情况 - **安防监控**:检测受限区域内的未经授权人员逗留或闯入 ## 许可证 本数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可证发布。
提供机构:
gadgadgad
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作