EchoX-Dialougues
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/EchoX-Dialougues
下载链接
链接失效反馈官方服务:
资源简介:
<div align="center">
<h1>
EchoX-Dialogues: Training Data for EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
</h1>
</div>
<p align="center">
<font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈⬛ Github</a> | <a href="https://arxiv.org/abs/2509.09174">📃 Paper</a> | <a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 Space</a> </font>
</p>
<p align="center">
<font size="3"><a href="https://huggingface.co/FreedomIntelligence/EchoX-8B">🧠 EchoX-8B</a> | <a href="https://huggingface.co/FreedomIntelligence/EchoX-3B">🧠 EchoX-3B</a> | <a href="https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus">📦 EchoX-Dialogues-Plus</a> </font>
</p>
**EchoX-Dialogues** provides the primary **speech dialogue** data used to train **EchoX**, restricted to **S2T (speech → text)** in this repository.
All input speech is **synthetic**; text is derived from public sources with **multi-stage cleaning and rewriting**. Most turns include **`asr` / `wer`** for WER-based filtering.
---
## Contents & Statistics
| Subset | Hours | Notes |
| ---------------------- | -----------: | ---------------------------------------------------------------------- |
| **Magpie-Pro-Speech+** | **327.0441** | Magpie-style instruction data, cleaned/rewritten; **synthetic speech** |
| **sharechatx** | **44.5105** | Social/casual dialogues, cleaned/rewritten; **synthetic speech** |
| **Total** | **371.5546** | Speech understanding → text output (S2T) |
---
## Data Schema (minimal)
Each example is a multi-turn conversation with:
* `id`: unique identifier
* `conversations`: list of turns; each turn includes
* `from`: `"user"` or `"assistant"`
* `value`: reference text of the turn
* `audio`: path to the waveform for this turn (when present)
* `asr` *(optional, present on most turns)*: ASR transcript of **this turn’s** audio
* `wer` *(optional, present on most turns)*: WER between `asr` and `value`
* Some subsets may include helper fields (e.g., `transcription`) for alignment/debugging.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("KurtDu/EchoX-Dialogues", split="train")
```
### Filter by WER (example)
```python
def keep_low_wer(example, max_wer=0.2):
wers = [t["wer"] for t in example.get("conversations", []) if "wer" in t]
return bool(wers) and min(wers) <= max_wer
clean = ds.filter(keep_low_wer, fn_kwargs={"max_wer": 0.2})
```
> Load audio via your preferred I/O library (e.g., `torchaudio`, `soundfile`) using `audio` paths.
---
## Licensing & Provenance
* **This release (synthetic audio, cleaned/rewritten texts, metadata):** **Apache-2.0**
* **Upstream text sources:** if you reuse or compare with originals, follow their licenses/terms.
---
## Relation to EchoX & Resources
This dataset covers a substantial portion of EchoX’s **S2T** training data.
* Code: [https://github.com/FreedomIntelligence/EchoX](https://github.com/FreedomIntelligence/EchoX)
* Paper (abs): [https://arxiv.org/abs/2509.09174](https://arxiv.org/abs/2509.09174) • Paper (PDF): [http://arxiv.org/pdf/2509.09174](http://arxiv.org/pdf/2509.09174)
* Models:
* EchoX-8B — [https://huggingface.co/FreedomIntelligence/EchoX-8B](https://huggingface.co/FreedomIntelligence/EchoX-8B)
* EchoX-3B — [https://huggingface.co/FreedomIntelligence/EchoX-3B](https://huggingface.co/FreedomIntelligence/EchoX-3B)
* Space (demo): [https://huggingface.co/spaces/FreedomIntelligence/EchoX](https://huggingface.co/spaces/FreedomIntelligence/EchoX)
* Extended dataset (**S2S + S2T**): [https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus](https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus)
---
## Citation
If this dataset is useful, please cite EchoX:
```bibtex
@misc{zhang2025echoxmitigatingacousticsemanticgap,
title = {EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs},
author = {Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li},
year = {2025},
eprint = {2509.09174},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2509.09174}
}
```
<div align="center">
<h1>
EchoX-Dialogues:面向EchoX的训练数据——通过回声训练缓解语音转语音大语言模型(Speech-to-Speech LLMs)的声学-语义鸿沟
</h1>
</div>
<p align="center">
<font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈⬛ GitHub</a> | <a href="https://arxiv.org/abs/2509.09174">📃 论文</a> | <a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 演示空间</a> </font>
</p>
<p align="center">
<font size="3"><a href="https://huggingface.co/FreedomIntelligence/EchoX-8B">🧠 EchoX-8B 模型</a> | <a href="https://huggingface.co/FreedomIntelligence/EchoX-3B">🧠 EchoX-3B 模型</a> | <a href="https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus">📦 EchoX-Dialogues-Plus 扩展数据集</a> </font>
</p>
**EchoX-Dialogues** 为EchoX模型提供核心语音对话训练数据,本仓库仅涵盖语音转文本(Speech-to-Text,简称S2T)任务场景。所有输入语音均为合成生成;文本源自公开数据源,并经过多阶段清洗与重写处理。绝大多数对话轮次包含`asr`与`wer`字段,用于基于词错误率的筛选。
---
## 内容与统计数据
| 子集名称 | 时长(小时) | 说明 |
| ---------------------- | -----------: | ---------------------------------------------------------------------- |
| **Magpie-Pro-Speech+** | **327.0441** | Magpie风格指令数据,经清洗与重写;采用合成语音 |
| **sharechatx** | **44.5105** | 社交/休闲对话数据,经清洗与重写;采用合成语音 |
| **总计** | **371.5546** | 覆盖语音理解→文本输出(S2T)任务 |
---
## 数据 Schema(极简版)
本数据集的每条样本为多轮对话,包含以下字段:
* `id`:唯一标识符
* `conversations`:对话轮次列表,每个轮次包含以下子字段:
* `from`:对话角色,取值为`"user"`或`"assistant"`
* `value`:该轮次的参考文本
* `audio`:该轮次语音波形文件的路径(若存在)
* `asr`(可选,绝大多数轮次存在):该轮次语音的自动语音识别(Automatic Speech Recognition,ASR)转写结果
* `wer`(可选,绝大多数轮次存在):`asr`转写结果与`value`参考文本之间的词错误率(Word Error Rate,WER)
* 部分子集可能包含辅助字段(如`transcription`),用于对齐或调试。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("KurtDu/EchoX-Dialogues", split="train")
### 基于词错误率的筛选示例
python
def keep_low_wer(example, max_wer=0.2):
wers = [t["wer"] for t in example.get("conversations", []) if "wer" in t]
return bool(wers) and min(wers) <= max_wer
clean = ds.filter(keep_low_wer, fn_kwargs={"max_wer": 0.2})
> 可通过您偏好的音频I/O库(如`torchaudio`、`soundfile`)读取`audio`字段指向的音频文件。
---
## 许可与来源
* **本发布内容(合成语音、清洗重写后的文本、元数据):** 采用 **Apache-2.0** 许可协议
* **上游文本数据源:** 若您复用或对比原始文本,请遵循其对应的许可协议与使用条款。
---
## 与EchoX及相关资源的关联
本数据集覆盖了EchoX模型S2T任务训练数据的核心部分。
- 代码仓库:[https://github.com/FreedomIntelligence/EchoX](https://github.com/FreedomIntelligence/EchoX)
- 论文(摘要页):[https://arxiv.org/abs/2509.09174](https://arxiv.org/abs/2509.09174) • 论文(PDF版):[http://arxiv.org/pdf/2509.09174](http://arxiv.org/pdf/2509.09174)
- 模型:
- EchoX-8B — [https://huggingface.co/FreedomIntelligence/EchoX-8B](https://huggingface.co/FreedomIntelligence/EchoX-8B)
- EchoX-3B — [https://huggingface.co/FreedomIntelligence/EchoX-3B](https://huggingface.co/FreedomIntelligence/EchoX-3B)
- 演示空间:[https://huggingface.co/spaces/FreedomIntelligence/EchoX](https://huggingface.co/spaces/FreedomIntelligence/EchoX)
- 扩展数据集(**语音转语音+语音转文本**):[https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus](https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus)
---
## 引用
若本数据集对您的研究有所帮助,请引用EchoX相关论文:
bibtex
@misc{zhang2025echoxmitigatingacousticsemanticgap,
title = {EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs},
author = {Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li},
year = {2025},
eprint = {2509.09174},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2509.09174}
}
提供机构:
maas
创建时间:
2025-09-14



