NefIibata/SpatioChat
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/NefIibata/SpatioChat
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- question-answering
language:
- en
tags:
- spatial-audio
- audio-llm
- multi-agent
- evaluation
- multi-turn-dialogue
- 3d-audio
pretty_name: SpatioChat
---
# SpatioChat: A 3D Audio Dialogue Benchmark for Evaluating Spatial-Semantic Joint Reasoning
## 📖 Dataset Description
**SpatioChat** is the first 3D audio dialogue benchmark dedicated to systematically evaluating the joint reasoning capabilities of Spatial Audio Large Language Models (Spatial-ALLMs). It focuses on audio scenarios with tightly connected contextual logic and highly complex semantics, moving beyond simple isolated sound event perception.
The dataset contains **200 high-fidelity multi-turn dialogues** (1286 interaction turns, ~3.8 hours of audio) covering **159 diverse physical environments** (from daily bedrooms to rare scenes like caves).
## 🏆 Evaluation Tasks
SpatioChat evaluates models across three progressive reasoning tasks:
1. **Speaker Localization (45%):** Anchor the speaker in a multi-turn dialogue based on semantic clues and map them to physical spatial coordinates.
2. **Relational Reasoning (33%):** Understand dynamic interactions or static relative positions between speakers defined by dialogue semantics.
3. **Complex Reasoning (22%):** Deeply fuse and jointly reason physical sound source localization in the audio and relative spatial semantics in the text.
## 📂 Data Structure
The dataset does not have traditional train/test splits, as it is primarily designed for **zero-shot evaluation**. Data is organized into folders by sample ID (e.g., `0cs3jd7s`).
Each sample directory contains the following components:
* `0cs3jd7s.wav`: The complete mixed 3D spatial audio waveform.
* `0cs3jd7s_0.wav`: The complete un-spatialized dry audio waveform.
* `0cs3jd7s.json`: The dialogue script, containing spatial annotations, dialogue texts, and emotion prompts.
* `0cs3jd7s_QA.json`: The Question-Answering set specific to this audio sample, including CoT reasoning steps.
* `0cs3jd7s_layout.png`: The visual spatial layout mapping the geometric relationships of the scene.
* `RIR_8CH/`: Directory containing the 8-channel Room Impulse Responses (RIR) for different speaker positions.
* `segments/`: Directory containing generated dry audio slices for each individual utterance.
* `segments_8ch_direct/`: Directory containing spatial audio slices (dry audio convolved with RIR).
## 🚀 How to Use (Data Loading)
You can download the dataset directly via git clone:
```bash
git clone https://huggingface.co/datasets/NefIibata/SpatioChat
```
## 💻 GitHub Repository
For the complete data generation pipeline, please visit: [https://github.com/Nefllbata/SpatioChat](https://github.com/Nefllbata/SpatioChat)
提供机构:
NefIibata



