five

NefIibata/SpatioChat

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/NefIibata/SpatioChat
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - question-answering language: - en tags: - spatial-audio - audio-llm - multi-agent - evaluation - multi-turn-dialogue - 3d-audio pretty_name: SpatioChat --- # SpatioChat: A 3D Audio Dialogue Benchmark for Evaluating Spatial-Semantic Joint Reasoning ## 📖 Dataset Description **SpatioChat** is the first 3D audio dialogue benchmark dedicated to systematically evaluating the joint reasoning capabilities of Spatial Audio Large Language Models (Spatial-ALLMs). It focuses on audio scenarios with tightly connected contextual logic and highly complex semantics, moving beyond simple isolated sound event perception. The dataset contains **200 high-fidelity multi-turn dialogues** (1286 interaction turns, ~3.8 hours of audio) covering **159 diverse physical environments** (from daily bedrooms to rare scenes like caves). ## 🏆 Evaluation Tasks SpatioChat evaluates models across three progressive reasoning tasks: 1. **Speaker Localization (45%):** Anchor the speaker in a multi-turn dialogue based on semantic clues and map them to physical spatial coordinates. 2. **Relational Reasoning (33%):** Understand dynamic interactions or static relative positions between speakers defined by dialogue semantics. 3. **Complex Reasoning (22%):** Deeply fuse and jointly reason physical sound source localization in the audio and relative spatial semantics in the text. ## 📂 Data Structure The dataset does not have traditional train/test splits, as it is primarily designed for **zero-shot evaluation**. Data is organized into folders by sample ID (e.g., `0cs3jd7s`). Each sample directory contains the following components: * `0cs3jd7s.wav`: The complete mixed 3D spatial audio waveform. * `0cs3jd7s_0.wav`: The complete un-spatialized dry audio waveform. * `0cs3jd7s.json`: The dialogue script, containing spatial annotations, dialogue texts, and emotion prompts. * `0cs3jd7s_QA.json`: The Question-Answering set specific to this audio sample, including CoT reasoning steps. * `0cs3jd7s_layout.png`: The visual spatial layout mapping the geometric relationships of the scene. * `RIR_8CH/`: Directory containing the 8-channel Room Impulse Responses (RIR) for different speaker positions. * `segments/`: Directory containing generated dry audio slices for each individual utterance. * `segments_8ch_direct/`: Directory containing spatial audio slices (dry audio convolved with RIR). ## 🚀 How to Use (Data Loading) You can download the dataset directly via git clone: ```bash git clone https://huggingface.co/datasets/NefIibata/SpatioChat ``` ## 💻 GitHub Repository For the complete data generation pipeline, please visit: [https://github.com/Nefllbata/SpatioChat](https://github.com/Nefllbata/SpatioChat)
提供机构:
NefIibata
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作