OzLabs/rf-spectral-trajectories
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/OzLabs/rf-spectral-trajectories
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- time-series-forecasting
tags:
- radio-frequency
- spectrogram
- signal-processing
- simulation
- trajectory
- stft
- iq-data
- wireless-communication
pretty_name: RF Spectral Trajectories
size_categories:
- 10K<n<100K
configs:
- config_name: default
data_files:
- split: train
path: data/train.parquet
- split: validation
path: data/validation.parquet
- split: test
path: data/test.parquet
---
# RF Spectral Trajectories
Temporally ordered sequences of RF (radio frequency) spectrograms from simulated wideband communication environments. Each trajectory is a sliding window of 16 consecutive STFT observations from a continuous RF scene, capturing how multiple signals appear, disappear, drift in frequency, and vary in power over time.
## Dataset Structure
### Files
| File | Split | Trajectories | Size |
|------|-------|-------------|------|
| `train.h5` | train | 13,841 | 10 GB |
| `val.h5` | validation | 2,938 | 1.1 GB |
| `test.h5` | test | 2,999 | 1.2 GB |
### HDF5 Schema
Each file contains:
| Key | Shape | Dtype | Description |
|-----|-------|-------|-------------|
| `observations` | `[N, 16, 256, 51, 2]` | float16 | STFT magnitude (real, imaginary channels) |
| `timestamps` | `[N, 16]` | float64 | Time in seconds for each timestep |
| `source_ids` | `[N]` | string | Scene identifier (for provenance) |
| `sequence_ids` | `[N]` | string | Unique trajectory identifier |
**Observation tensor dimensions:**
- `N` — number of trajectories
- `16` — timesteps per trajectory (80ms each, 1.28s total)
- `256` — frequency bins (STFT, covering ±75 kHz)
- `51` — time bins within each 80ms STFT window
- `2` — real and imaginary components
### Splits
Split by **source scene** (not by trajectory) to prevent temporal leakage from overlapping sliding windows.
| Split | Scenes | Ratio |
|-------|--------|-------|
| train | 280 | 70% |
| validation | 60 | 15% |
| test | 60 | 15% |
## Loading
### With HuggingFace datasets
```python
from datasets import load_dataset
import numpy as np
ds = load_dataset("ozlabs/rf-spectral-trajectories", split="train")
# Metadata columns are directly accessible
print(ds[0]["regime"]) # e.g. "bursty"
print(ds[0]["snr_db"]) # e.g. 14
print(ds[0]["source_id"]) # e.g. "scene_0042"
# Decode observation tensor from binary
obs = np.frombuffer(ds[0]["observations"], dtype=np.float16).reshape(16, 256, 51, 2)
```
### With h5py (for direct HDF5 access)
```python
import h5py
with h5py.File("train.h5", "r") as f:
obs = f["observations"][0] # [16, 256, 51, 2] float16
ts = f["timestamps"][0] # [16] float64
src = f["source_ids"][0] # bytes
```
### With PyTorch
```python
from lewm_pipeline.dataset import LeWMDataset
ds = LeWMDataset("train.h5")
item = ds[0]
# item["observations"]: torch.float32 tensor [16, 256, 51, 2]
# item["timestamps"]: torch.float64 tensor [16]
# item["source_id"]: str
# item["sequence_id"]: str
```
## Generation Parameters
### Signal Simulation
| Parameter | Value |
|-----------|-------|
| Sample rate | 150 kHz |
| Timestep duration | 80 ms (12,000 samples) |
| Scene duration | 5.2 s (65 timesteps) |
| Trajectory length | 16 timesteps (1.28 s) |
| Sliding window stride | 1 timestep |
| Modulation types | BPSK, QPSK, 8PSK, 16QAM, 64QAM |
| Signals per scene | 2–4 (randomly placed in frequency) |
| Channel models | Rayleigh, Rician (with evolving fading) |
| SNR range | -8 to +30 dB |
| Doppler speeds | 0–12 m/s |
### STFT Parameters
| Parameter | Value |
|-----------|-------|
| Window | Hamming, 256 samples |
| Overlap | 16 samples |
| FFT size | 256 |
| Sided | Two-sided (complex input) |
### Activity Regimes
Each scene follows one of 8 activity regimes (50 scenes each, 400 total):
| Regime | Description |
|--------|-------------|
| `quiet` | Few active signals, low duty cycle, long silences |
| `dense` | Many signals active simultaneously, high overlap |
| `bursty` | Rapid on/off transitions, short bursts |
| `ramp_up` | Signals appear progressively through the scene |
| `interference_event` | Stable signals, then a disruptive signal appears mid-scene |
| `correlated_alternating` | Signal pairs alternate: A on when B is off |
| `correlated_leader_follower` | Signal B appears 1–3 timesteps after signal A |
| `random` | Independent random burst patterns |
### Dynamic Features
- **Frequency drift**: Signals slowly wander in frequency (random walk, bounded to ±50% of signal bandwidth)
- **Power variation**: Smooth per-timestep power levels (0.3–1.0 when active), gradual fade-in/out, correlated power drift
- **Channel fading**: Rayleigh/Rician fading evolves continuously within each scene (no state reset between timesteps)
## Source
Generated using the [ChangShuoRadioData (CSRD)](https://github.com/Singingkettle/ChangShuoRadioData) MATLAB simulation framework with the LeWM dataset pipeline.
### Citation
```bibtex
@software{csrd_rf_spectral_trajectories_2026,
title = {RF Spectral Trajectories},
author = {Ozlabs},
year = {2026},
url = {https://huggingface.co/datasets/ozlabs/rf-spectral-trajectories}
}
```
### Related
- [ChangShuoRadioData](https://github.com/Singingkettle/ChangShuoRadioData) — MATLAB RF simulation framework
- [Joint Signal Detection and AMC via Deep Learning](https://ieeexplore.ieee.org/abstract/document/10667001) — IEEE TWC paper using CSRD data
提供机构:
OzLabs
搜集汇总
数据集介绍

构建方式
在无线通信领域,对射频信号的动态行为进行建模是理解复杂电磁环境的关键。RF Spectral Trajectories数据集通过MATLAB仿真框架ChangShuoRadioData系统生成,模拟了宽带通信场景。其构建过程首先设定持续5.2秒的连续射频场景,并以80毫秒为时间步长进行采样。每个数据轨迹通过滑动窗口从场景中截取16个连续的短时傅里叶变换观测结果,窗口步长为1个时间步,从而形成总计1.28秒的时序序列。仿真涵盖了多种调制类型、信道模型以及信噪比范围,并引入了频率漂移、功率起伏和连续信道衰落等动态特征,确保了数据在时频域上的真实性与复杂性。
使用方法
为便于研究与应用,数据集提供了多种灵活的加载方式。用户可通过Hugging Face的`datasets`库直接加载,并访问观测张量及元数据列。对于需要直接操作底层数据的场景,可利用`h5py`库读取HDF5格式文件,获取原始的浮点数组与时间戳。此外,专为PyTorch设计的`LeWMDataset`类能够将数据自动转换为张量格式,简化了深度学习流程的集成。这些接口设计使得数据集能够无缝适配于信号检测、调制识别、时序预测等无线通信领域的机器学习任务。
背景与挑战
背景概述
随着无线通信技术的飞速发展,频谱资源的动态管理与信号智能感知成为关键研究议题。RF Spectral Trajectories数据集由Ozlabs团队于2026年构建,依托ChangShuoRadioData仿真框架生成,专注于宽频带通信环境中射频信号的时序演变。该数据集以模拟的射频频谱图序列为核心,捕捉多信号在时频域中的出现、消失、频率漂移及功率变化等动态特征,旨在为时间序列预测、信号检测与调制识别等任务提供高质量基准数据。其结构化设计支持对复杂电磁场景的深入分析,推动了通信信号处理与机器学习交叉领域的前沿探索。
当前挑战
在无线信号处理领域,准确建模动态频谱中的多信号交互与演化行为面临显著挑战。RF Spectral Trajectories数据集致力于解决宽频带环境下信号轨迹的时序预测与状态推断问题,需应对信号重叠、频率漂移、时变衰落及噪声干扰等复杂因素。数据构建过程中,仿真框架需平衡物理真实性与计算可行性,涵盖多种调制类型、信道模型与活动模式,同时确保时序连贯性与数据规模。此外,划分训练、验证与测试集时需避免时间泄漏,保证模型评估的严谨性,这对数据生成与标注流程提出了较高要求。
常用场景
经典使用场景
在无线通信领域,频谱动态监测是理解复杂电磁环境的基础。RF Spectral Trajectories数据集通过模拟宽带通信场景,提供了时间有序的射频频谱图序列,每个轨迹包含16个连续的短时傅里叶变换观测窗口。这一结构使其成为研究信号时频域动态行为的理想工具,例如追踪多个信号在频率上的漂移、功率的渐变以及信号的突发与消失模式。数据集广泛应用于深度学习模型的训练与评估,特别是在时间序列预测任务中,为模型学习信号活动的时空演化规律提供了高质量、结构化的数据支撑。
解决学术问题
该数据集主要针对无线通信中信号检测与分类的学术挑战。传统方法在动态多信号环境下往往难以处理频率漂移、功率波动及信道衰落等复杂因素。RF Spectral Trajectories通过模拟包含多种调制类型、信噪比范围及活动模式的场景,为研究人员提供了标准化的基准数据。它有效解决了非平稳信号环境下模型泛化能力不足的问题,推动了基于深度学习的联合信号检测与自动调制识别技术的发展,并为频谱感知、干扰管理等研究方向提供了可靠的实验基础。
实际应用
在实际工程层面,该数据集对认知无线电与动态频谱接入系统具有重要价值。通过模拟密集信号、突发传输及干扰事件等多种活动模式,数据集能够训练模型实时监测频谱占用状态,识别非法信号或异常干扰。在军事通信与民用网络管理中,此类技术可用于增强频谱资源的利用效率,提升通信系统的抗干扰能力与安全性。此外,数据集生成的仿真环境为5G及未来无线网络中的智能频谱管理算法提供了可重复、可扩展的测试平台。
数据集最近研究
最新研究方向
在无线通信领域,频谱动态感知与智能信号处理正成为前沿热点。RF Spectral Trajectories数据集通过模拟宽带通信环境中的射频谱图序列,为研究时变信号行为提供了结构化基准。当前研究聚焦于利用深度学习模型对频谱轨迹进行时空建模,以应对信号在频率漂移、功率波动及多径衰落等复杂场景下的实时检测与分类挑战。该数据集支持对突发性、密集性及相关性等多种活动模式的深入分析,推动了自适应调制识别、动态频谱接入及干扰抑制等关键技术发展,尤其在认知无线电与下一代通信系统优化中展现出重要价值。
以上内容由遇见数据集搜集并总结生成



