EchoX-Dialougues

Name: EchoX-Dialougues
Creator: maas
Published: 2025-12-05 16:50:31
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/FreedomIntelligence/EchoX-Dialougues

下载链接

链接失效反馈

官方服务：

资源简介：

<div align="center"> <h1> EchoX-Dialogues: Training Data for EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs </h1> </div> <p align="center"> <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ Github</a>&nbsp｜&nbsp<a href="https://arxiv.org/abs/2509.09174">📃 Paper</a>&nbsp｜&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 Space</a>&nbsp</font> </p> <p align="center"> <font size="3"><a href="https://huggingface.co/FreedomIntelligence/EchoX-8B">🧠 EchoX-8B</a>&nbsp｜&nbsp<a href="https://huggingface.co/FreedomIntelligence/EchoX-3B">🧠 EchoX-3B</a>&nbsp｜&nbsp<a href="https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus">📦 EchoX-Dialogues-Plus</a>&nbsp</font> </p> **EchoX-Dialogues** provides the primary **speech dialogue** data used to train **EchoX**, restricted to **S2T (speech → text)** in this repository. All input speech is **synthetic**; text is derived from public sources with **multi-stage cleaning and rewriting**. Most turns include **`asr` / `wer`** for WER-based filtering. --- ## Contents & Statistics | Subset | Hours | Notes | | ---------------------- | -----------: | ---------------------------------------------------------------------- | | **Magpie-Pro-Speech+** | **327.0441** | Magpie-style instruction data, cleaned/rewritten; **synthetic speech** | | **sharechatx** | **44.5105** | Social/casual dialogues, cleaned/rewritten; **synthetic speech** | | **Total** | **371.5546** | Speech understanding → text output (S2T) | --- ## Data Schema (minimal) Each example is a multi-turn conversation with: * `id`: unique identifier * `conversations`: list of turns; each turn includes * `from`: `"user"` or `"assistant"` * `value`: reference text of the turn * `audio`: path to the waveform for this turn (when present) * `asr` *(optional, present on most turns)*: ASR transcript of **this turn’s** audio * `wer` *(optional, present on most turns)*: WER between `asr` and `value` * Some subsets may include helper fields (e.g., `transcription`) for alignment/debugging. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("KurtDu/EchoX-Dialogues", split="train") ``` ### Filter by WER (example) ```python def keep_low_wer(example, max_wer=0.2): wers = [t["wer"] for t in example.get("conversations", []) if "wer" in t] return bool(wers) and min(wers) <= max_wer clean = ds.filter(keep_low_wer, fn_kwargs={"max_wer": 0.2}) ``` > Load audio via your preferred I/O library (e.g., `torchaudio`, `soundfile`) using `audio` paths. --- ## Licensing & Provenance * **This release (synthetic audio, cleaned/rewritten texts, metadata):** **Apache-2.0** * **Upstream text sources:** if you reuse or compare with originals, follow their licenses/terms. --- ## Relation to EchoX & Resources This dataset covers a substantial portion of EchoX’s **S2T** training data. * Code: [https://github.com/FreedomIntelligence/EchoX](https://github.com/FreedomIntelligence/EchoX) * Paper (abs): [https://arxiv.org/abs/2509.09174](https://arxiv.org/abs/2509.09174) • Paper (PDF): [http://arxiv.org/pdf/2509.09174](http://arxiv.org/pdf/2509.09174) * Models: * EchoX-8B — [https://huggingface.co/FreedomIntelligence/EchoX-8B](https://huggingface.co/FreedomIntelligence/EchoX-8B) * EchoX-3B — [https://huggingface.co/FreedomIntelligence/EchoX-3B](https://huggingface.co/FreedomIntelligence/EchoX-3B) * Space (demo): [https://huggingface.co/spaces/FreedomIntelligence/EchoX](https://huggingface.co/spaces/FreedomIntelligence/EchoX) * Extended dataset (**S2S + S2T**): [https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus](https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus) --- ## Citation If this dataset is useful, please cite EchoX: ```bibtex @misc{zhang2025echoxmitigatingacousticsemanticgap, title = {EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs}, author = {Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li}, year = {2025}, eprint = {2509.09174}, archivePrefix= {arXiv}, primaryClass = {cs.CL}, url = {https://arxiv.org/abs/2509.09174} } ```

<div align="center"> <h1> EchoX-Dialogues：面向EchoX的训练数据——通过回声训练缓解语音转语音大语言模型（Speech-to-Speech LLMs）的声学-语义鸿沟 </h1> </div> <p align="center"> <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ GitHub</a>&nbsp｜&nbsp<a href="https://arxiv.org/abs/2509.09174">📃 论文</a>&nbsp｜&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 演示空间</a>&nbsp</font> </p> <p align="center"> <font size="3"><a href="https://huggingface.co/FreedomIntelligence/EchoX-8B">🧠 EchoX-8B 模型</a>&nbsp｜&nbsp<a href="https://huggingface.co/FreedomIntelligence/EchoX-3B">🧠 EchoX-3B 模型</a>&nbsp｜&nbsp<a href="https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus">📦 EchoX-Dialogues-Plus 扩展数据集</a>&nbsp</font> </p> **EchoX-Dialogues** 为EchoX模型提供核心语音对话训练数据，本仓库仅涵盖语音转文本（Speech-to-Text，简称S2T）任务场景。所有输入语音均为合成生成；文本源自公开数据源，并经过多阶段清洗与重写处理。绝大多数对话轮次包含`asr`与`wer`字段，用于基于词错误率的筛选。 --- ## 内容与统计数据 | 子集名称 | 时长（小时） | 说明 | | ---------------------- | -----------: | ---------------------------------------------------------------------- | | **Magpie-Pro-Speech+** | **327.0441** | Magpie风格指令数据，经清洗与重写；采用合成语音 | | **sharechatx** | **44.5105** | 社交/休闲对话数据，经清洗与重写；采用合成语音 | | **总计** | **371.5546** | 覆盖语音理解→文本输出（S2T）任务 | --- ## 数据 Schema（极简版）本数据集的每条样本为多轮对话，包含以下字段： * `id`：唯一标识符 * `conversations`：对话轮次列表，每个轮次包含以下子字段： * `from`：对话角色，取值为`"user"`或`"assistant"` * `value`：该轮次的参考文本 * `audio`：该轮次语音波形文件的路径（若存在） * `asr`（可选，绝大多数轮次存在）：该轮次语音的自动语音识别（Automatic Speech Recognition，ASR）转写结果 * `wer`（可选，绝大多数轮次存在）：`asr`转写结果与`value`参考文本之间的词错误率（Word Error Rate，WER） * 部分子集可能包含辅助字段（如`transcription`），用于对齐或调试。 --- ## 快速上手 python from datasets import load_dataset ds = load_dataset("KurtDu/EchoX-Dialogues", split="train") ### 基于词错误率的筛选示例 python def keep_low_wer(example, max_wer=0.2): wers = [t["wer"] for t in example.get("conversations", []) if "wer" in t] return bool(wers) and min(wers) <= max_wer clean = ds.filter(keep_low_wer, fn_kwargs={"max_wer": 0.2}) > 可通过您偏好的音频I/O库（如`torchaudio`、`soundfile`）读取`audio`字段指向的音频文件。 --- ## 许可与来源 * **本发布内容（合成语音、清洗重写后的文本、元数据）：** 采用 **Apache-2.0** 许可协议 * **上游文本数据源：** 若您复用或对比原始文本，请遵循其对应的许可协议与使用条款。 --- ## 与EchoX及相关资源的关联本数据集覆盖了EchoX模型S2T任务训练数据的核心部分。 - 代码仓库：[https://github.com/FreedomIntelligence/EchoX](https://github.com/FreedomIntelligence/EchoX) - 论文（摘要页）：[https://arxiv.org/abs/2509.09174](https://arxiv.org/abs/2509.09174) • 论文（PDF版）：[http://arxiv.org/pdf/2509.09174](http://arxiv.org/pdf/2509.09174) - 模型： - EchoX-8B — [https://huggingface.co/FreedomIntelligence/EchoX-8B](https://huggingface.co/FreedomIntelligence/EchoX-8B) - EchoX-3B — [https://huggingface.co/FreedomIntelligence/EchoX-3B](https://huggingface.co/FreedomIntelligence/EchoX-3B) - 演示空间：[https://huggingface.co/spaces/FreedomIntelligence/EchoX](https://huggingface.co/spaces/FreedomIntelligence/EchoX) - 扩展数据集（**语音转语音+语音转文本**）：[https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus](https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus) --- ## 引用若本数据集对您的研究有所帮助，请引用EchoX相关论文： bibtex @misc{zhang2025echoxmitigatingacousticsemanticgap, title = {EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs}, author = {Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li}, year = {2025}, eprint = {2509.09174}, archivePrefix= {arXiv}, primaryClass = {cs.CL}, url = {https://arxiv.org/abs/2509.09174} }

提供机构：

maas

创建时间：

2025-09-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集