gadgadgad/OfficeLocalization
收藏Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/gadgadgad/OfficeLocalization
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
task_categories:
- time-series-forecasting
tags:
- wifi-sensing
- csi
- indoor-localization
- occupancy-detection
- esp32
- smart-home
- indoor-sensing
pretty_name: "Office Localization — WiFi CSI Indoor Localization (Office)"
size_categories:
- 1M<n<10M
---
# Office Localization — WiFi CSI Indoor Localization (Office Environment)
## Dataset Description
**Office Localization** is a WiFi Channel State Information (CSI) dataset for zone-level indoor localization and occupancy region detection, collected in an office environment using two ESP32-C6 microcontrollers operating as commodity 802.11n access points. It contains **4 region/occupancy classes** recorded across **2 temporal sessions per class**, totaling approximately **1.6 million CSI packets** and **~124 minutes** of continuous recording.
This dataset is part of the research paper:
> **WiFi Sensing-Based Human Activity Recognition For Smart Home Applications Using Commodity Access Points**
> Gad Gad, Iqra Batool, Mostafa M. Fouda, Shikhar Verma, Zubair Md Fadlullah
> IEEE, 2026
📄 [Paper](https://gadm21.github.io/WifiSensingESP32HAR/IEEE_2026__wifi_sensing_.pdf) · ⚡ [GitHub](https://github.com/gadm21/WifiSensingESP32HAR) · 🌐 [Project Page](https://gadm21.github.io/WifiSensingESP32HAR/)
## Region / Occupancy Classes
| Label | Description |
|-------|-------------|
| `empty` | No human present in the sensing area |
| `one` | Person present in Zone 1 of the office |
| `two` | Person present in Zone 2 of the office |
| `five` | Person present in Zone 5 of the office |
The zone labels correspond to distinct spatial regions within the office. The task is to determine **where** a person is located (or if the room is empty) based solely on how their body perturbs the WiFi channel between the transmitter and receiver.
## Collection Setup
| Parameter | Value |
|-----------|-------|
| **Hardware** | 2 × ESP32-C6 (TX: AP mode, RX: STA mode) |
| **WiFi Standard** | 802.11n, 20 MHz bandwidth, HT-LTF |
| **Subcarriers** | 64 total (52 LLTF data subcarriers extracted) |
| **Packet Rate** | ~200 packets/sec (irregular, resampled to 150 Hz) |
| **Transport** | UART serial @ 115200 baud |
| **Environment** | Office room with desks, chairs, and typical office furniture |
| **TX–RX Distance** | ~3 meters, line-of-sight |
| **Recorded** | October 2025 |
## Data Organization
| File | Label | Split | Approx. Packets |
|------|-------|-------|-----------------|
| `empty_1.csv` | empty | Train | ~210K |
| `empty_2.csv` | empty | Test | ~210K |
| `five_1.csv` | five | Train | ~150K |
| `five_2.csv` | five | Test | ~150K |
| `one_1.csv` | one | Train | ~150K |
| `one_2.csv` | one | Test | ~150K |
| `two_1.csv` | two | Train | ~150K |
| `two_2.csv` | two | Test | ~150K |
**Split strategy**: File-based temporal holdout. The first recording session per label is used for training and the second for testing. This ensures the model generalizes to temporally distinct data collected at a different time.
## CSV Format
Each CSV file contains one row per received CSI packet with the following columns:
| Column | Description |
|--------|-------------|
| `type` | Packet type (always `CSI_DATA`) |
| `seq` | Sequence number / local timestamp |
| `mac` | Transmitter MAC address |
| `rssi` | Received Signal Strength Indicator (dBm) |
| `rate` | PHY rate index |
| `noise_floor` | Noise floor estimate (dBm) |
| `fft_gain` | FFT gain applied by hardware |
| `agc_gain` | Automatic Gain Control value |
| `channel` | WiFi channel number |
| `local_timestamp` | ESP32 local timestamp (µs) |
| `sig_len` | Signal length |
| `rx_state` | Receiver state |
| `len` | CSI data length (128 = 64 subcarriers × 2 components) |
| `first_word` | Header word |
| `data` | Raw CSI data as `[I₀, Q₀, I₁, Q₁, ..., I₆₃, Q₆₃]` — 128 signed integers representing in-phase and quadrature components for 64 subcarriers |
## Recommended Preprocessing Pipeline
1. **Load** CSV and parse the `data` column into complex I/Q arrays
2. **Select** 52 LLTF subcarriers (discard guard/null subcarriers)
3. **Resample** to a uniform 150 Hz sample rate (original rate is irregular ~100–200 Hz)
4. **Feature extraction**: Rolling variance with window W ∈ {20, 200, 2000} (recommended: W=200)
5. **Windowing**: Segment into fixed-length windows (e.g., 100 samples = 0.67s at 150 Hz)
## Benchmark Results
Best results from the paper using rolling-variance features (W=200):
| Classifier | Accuracy |
|-----------|----------|
| Random Forest | 89.1% |
| XGBoost | 88.6% |
| Conv1D | 95.7% |
| CNN-LSTM | 96.7% |
| PCA + KNN | 84.1% |
Office Localization achieves excellent results with deep learning models, demonstrating that commodity WiFi CSI can perform zone-level indoor localization without any dedicated infrastructure — just two off-the-shelf ESP32-C6 boards.
## Use Cases
- **Smart building management**: Automatically determine which zones are occupied
- **Energy optimization**: Zone-aware HVAC and lighting control
- **Elderly care**: Non-intrusive monitoring of movement between rooms/zones
- **Security**: Detect unauthorized presence in restricted zones
## License
This dataset is released under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.
语言:
- 英语
许可证:CC BY 4.0
任务类别:
- 时间序列预测
标签:
- WiFi感知
- 信道状态信息(Channel State Information, CSI)
- 室内定位
- 占用检测
- ESP32
- 智能家居
- 室内感知
展示名称:"办公场景定位——WiFi CSI室内定位(办公环境)"
数据规模:
- 100万 < 样本量 < 1000万
# 办公场景定位——WiFi CSI室内定位(办公环境)
## 数据集描述
**办公定位数据集**是一款面向区域级室内定位与占用区域检测的WiFi信道状态信息(Channel State Information, CSI)数据集,采集于办公环境,采用两台ESP32-C6微控制器作为商用802.11n接入点。该数据集包含**4个区域/占用类别**,每个类别对应**2次时序采集会话**,总计约**160万条CSI数据包**与**约124分钟**的连续采集数据。
本数据集关联研究论文:
> **基于商用接入点的WiFi感知人体活动识别及其智能家居应用**
> Gad Gad、Iqra Batool、Mostafa M. Fouda、Shikhar Verma、Zubair Md Fadlullah
> IEEE,2026
📄 [论文](https://gadm21.github.io/WifiSensingESP32HAR/IEEE_2026__wifi_sensing_.pdf) · ⚡ [GitHub仓库](https://github.com/gadm21/WifiSensingESP32HAR) · 🌐 [项目主页](https://gadm21.github.io/WifiSensingESP32HAR/)
## 区域/占用类别
| 标签 | 描述 |
|-------|-------------|
| `empty` | 感知区域内无人员 |
| `one` | 办公室1区有人员 |
| `two` | 办公室2区有人员 |
| `five` | 办公室5区有人员 |
区域标签对应办公区内的不同独立空间区域。本任务的目标是仅通过人体对收发端间WiFi信道的扰动,判断人员所在位置(或房间是否为空)。
## 采集设置
| 参数 | 取值 |
|-----------|-------|
| **硬件** | 2 × ESP32-C6(发射端:接入点模式,接收端:站点模式) |
| **WiFi标准** | 802.11n,20MHz带宽,HT-LTF |
| **子载波** | 总计64个(提取52个LLTF数据子载波) |
| **数据包速率** | 约200包/秒(速率不规则,已重采样至150Hz) |
| **传输方式** | UART串口,波特率115200 |
| **采集环境** | 配备办公桌、座椅及典型办公家具的办公室 |
| **收发间距** | 约3米,视距传输 |
| **采集时间** | 2025年10月 |
## 数据组织
| 文件 | 标签 | 划分 | 近似数据包数 |
|------|-------|-------|-----------------|
| `empty_1.csv` | empty | 训练集 | ~21万 |
| `empty_2.csv` | empty | 测试集 | ~21万 |
| `five_1.csv` | five | 训练集 | ~15万 |
| `five_2.csv` | five | 测试集 | ~15万 |
| `one_1.csv` | one | 训练集 | ~15万 |
| `one_2.csv` | one | 测试集 | ~15万 |
| `two_1.csv` | two | 训练集 | ~15万 |
| `two_2.csv` | two | 测试集 | ~15万 |
**划分策略**:基于文件的时序留存法。每个类别的第一次采集会话用于训练,第二次用于测试,以此确保模型可泛化至不同时段采集的时序差异数据。
## CSV格式
每个CSV文件的每行对应一条接收的CSI数据包,各列说明如下:
| 列名 | 描述 |
|--------|-------------|
| `type` | 数据包类型(始终为`CSI_DATA`) |
| `seq` | 序列号/本地时间戳 |
| `mac` | 发射端MAC地址 |
| `rssi` | 接收信号强度指示(dBm) |
| `rate` | PHY速率索引 |
| `noise_floor` | 噪声基底估计值(dBm) |
| `fft_gain` | 硬件应用的FFT增益 |
| `agc_gain` | 自动增益控制值 |
| `channel` | WiFi信道编号 |
| `local_timestamp` | ESP32本地时间戳(微秒) |
| `sig_len` | 信号长度 |
| `rx_state` | 接收器状态 |
| `len` | CSI数据长度(128 = 64个子载波 × 2个分量) |
| `first_word` | 头部字 |
| `data` | 原始CSI数据,格式为`[I₀, Q₀, I₁, Q₁, ..., I₆₃, Q₆₃]`——128个有符号整数,代表64个子载波的同相(I)与正交(Q)分量 |
## 推荐预处理流程
1. **加载**CSV文件并解析`data`列为复数I/Q数组
2. **选取**52个LLTF子载波(丢弃保护/空载波)
3. **重采样**至统一的150Hz采样率(原始速率不规则,约100~200Hz)
4. **特征提取**:采用窗口大小W∈{20, 200, 2000}的滑动方差(推荐W=200)
5. **窗口划分**:分割为固定长度窗口(例如,150Hz下100个采样点对应0.67秒)
## 基准测试结果
论文中采用滑动方差特征(W=200)的最优分类结果如下:
| 分类器 | 准确率 |
|-----------|----------|
| 随机森林 | 89.1% |
| XGBoost | 88.6% |
| 一维卷积神经网络(Conv1D) | 95.7% |
| CNN-LSTM混合模型 | 96.7% |
| PCA+KNN | 84.1% |
办公定位数据集在深度学习模型上取得了优异的效果,证明了商用WiFi CSI可在无需专用基础设施的情况下实现区域级室内定位——仅需两台现成的ESP32-C6开发板。
## 应用场景
- **智能楼宇管理**:自动识别各区域的占用状态
- **能源优化**:基于区域感知的暖通空调与照明控制
- **养老看护**:非侵入式监测人员在房间/区域间的移动情况
- **安防监控**:检测受限区域内的未经授权人员逗留或闯入
## 许可证
本数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可证发布。
提供机构:
gadgadgad



