five

EchoX-Dialougues

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/EchoX-Dialougues
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center"> <h1> EchoX-Dialogues: Training Data for EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs </h1> </div> <p align="center"> <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ Github</a>&nbsp|&nbsp<a href="https://arxiv.org/abs/2509.09174">📃 Paper</a>&nbsp|&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 Space</a>&nbsp</font> </p> <p align="center"> <font size="3"><a href="https://huggingface.co/FreedomIntelligence/EchoX-8B">🧠 EchoX-8B</a>&nbsp|&nbsp<a href="https://huggingface.co/FreedomIntelligence/EchoX-3B">🧠 EchoX-3B</a>&nbsp|&nbsp<a href="https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus">📦 EchoX-Dialogues-Plus</a>&nbsp</font> </p> **EchoX-Dialogues** provides the primary **speech dialogue** data used to train **EchoX**, restricted to **S2T (speech → text)** in this repository. All input speech is **synthetic**; text is derived from public sources with **multi-stage cleaning and rewriting**. Most turns include **`asr` / `wer`** for WER-based filtering. --- ## Contents & Statistics | Subset | Hours | Notes | | ---------------------- | -----------: | ---------------------------------------------------------------------- | | **Magpie-Pro-Speech+** | **327.0441** | Magpie-style instruction data, cleaned/rewritten; **synthetic speech** | | **sharechatx** | **44.5105** | Social/casual dialogues, cleaned/rewritten; **synthetic speech** | | **Total** | **371.5546** | Speech understanding → text output (S2T) | --- ## Data Schema (minimal) Each example is a multi-turn conversation with: * `id`: unique identifier * `conversations`: list of turns; each turn includes * `from`: `"user"` or `"assistant"` * `value`: reference text of the turn * `audio`: path to the waveform for this turn (when present) * `asr` *(optional, present on most turns)*: ASR transcript of **this turn’s** audio * `wer` *(optional, present on most turns)*: WER between `asr` and `value` * Some subsets may include helper fields (e.g., `transcription`) for alignment/debugging. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("KurtDu/EchoX-Dialogues", split="train") ``` ### Filter by WER (example) ```python def keep_low_wer(example, max_wer=0.2): wers = [t["wer"] for t in example.get("conversations", []) if "wer" in t] return bool(wers) and min(wers) <= max_wer clean = ds.filter(keep_low_wer, fn_kwargs={"max_wer": 0.2}) ``` > Load audio via your preferred I/O library (e.g., `torchaudio`, `soundfile`) using `audio` paths. --- ## Licensing & Provenance * **This release (synthetic audio, cleaned/rewritten texts, metadata):** **Apache-2.0** * **Upstream text sources:** if you reuse or compare with originals, follow their licenses/terms. --- ## Relation to EchoX & Resources This dataset covers a substantial portion of EchoX’s **S2T** training data. * Code: [https://github.com/FreedomIntelligence/EchoX](https://github.com/FreedomIntelligence/EchoX) * Paper (abs): [https://arxiv.org/abs/2509.09174](https://arxiv.org/abs/2509.09174) • Paper (PDF): [http://arxiv.org/pdf/2509.09174](http://arxiv.org/pdf/2509.09174) * Models: * EchoX-8B — [https://huggingface.co/FreedomIntelligence/EchoX-8B](https://huggingface.co/FreedomIntelligence/EchoX-8B) * EchoX-3B — [https://huggingface.co/FreedomIntelligence/EchoX-3B](https://huggingface.co/FreedomIntelligence/EchoX-3B) * Space (demo): [https://huggingface.co/spaces/FreedomIntelligence/EchoX](https://huggingface.co/spaces/FreedomIntelligence/EchoX) * Extended dataset (**S2S + S2T**): [https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus](https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus) --- ## Citation If this dataset is useful, please cite EchoX: ```bibtex @misc{zhang2025echoxmitigatingacousticsemanticgap, title = {EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs}, author = {Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li}, year = {2025}, eprint = {2509.09174}, archivePrefix= {arXiv}, primaryClass = {cs.CL}, url = {https://arxiv.org/abs/2509.09174} } ```

<div align="center"> <h1> EchoX-Dialogues:面向EchoX的训练数据——通过回声训练缓解语音转语音大语言模型(Speech-to-Speech LLMs)的声学-语义鸿沟 </h1> </div> <p align="center"> <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ GitHub</a>&nbsp|&nbsp<a href="https://arxiv.org/abs/2509.09174">📃 论文</a>&nbsp|&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 演示空间</a>&nbsp</font> </p> <p align="center"> <font size="3"><a href="https://huggingface.co/FreedomIntelligence/EchoX-8B">🧠 EchoX-8B 模型</a>&nbsp|&nbsp<a href="https://huggingface.co/FreedomIntelligence/EchoX-3B">🧠 EchoX-3B 模型</a>&nbsp|&nbsp<a href="https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus">📦 EchoX-Dialogues-Plus 扩展数据集</a>&nbsp</font> </p> **EchoX-Dialogues** 为EchoX模型提供核心语音对话训练数据,本仓库仅涵盖语音转文本(Speech-to-Text,简称S2T)任务场景。所有输入语音均为合成生成;文本源自公开数据源,并经过多阶段清洗与重写处理。绝大多数对话轮次包含`asr`与`wer`字段,用于基于词错误率的筛选。 --- ## 内容与统计数据 | 子集名称 | 时长(小时) | 说明 | | ---------------------- | -----------: | ---------------------------------------------------------------------- | | **Magpie-Pro-Speech+** | **327.0441** | Magpie风格指令数据,经清洗与重写;采用合成语音 | | **sharechatx** | **44.5105** | 社交/休闲对话数据,经清洗与重写;采用合成语音 | | **总计** | **371.5546** | 覆盖语音理解→文本输出(S2T)任务 | --- ## 数据 Schema(极简版) 本数据集的每条样本为多轮对话,包含以下字段: * `id`:唯一标识符 * `conversations`:对话轮次列表,每个轮次包含以下子字段: * `from`:对话角色,取值为`"user"`或`"assistant"` * `value`:该轮次的参考文本 * `audio`:该轮次语音波形文件的路径(若存在) * `asr`(可选,绝大多数轮次存在):该轮次语音的自动语音识别(Automatic Speech Recognition,ASR)转写结果 * `wer`(可选,绝大多数轮次存在):`asr`转写结果与`value`参考文本之间的词错误率(Word Error Rate,WER) * 部分子集可能包含辅助字段(如`transcription`),用于对齐或调试。 --- ## 快速上手 python from datasets import load_dataset ds = load_dataset("KurtDu/EchoX-Dialogues", split="train") ### 基于词错误率的筛选示例 python def keep_low_wer(example, max_wer=0.2): wers = [t["wer"] for t in example.get("conversations", []) if "wer" in t] return bool(wers) and min(wers) <= max_wer clean = ds.filter(keep_low_wer, fn_kwargs={"max_wer": 0.2}) > 可通过您偏好的音频I/O库(如`torchaudio`、`soundfile`)读取`audio`字段指向的音频文件。 --- ## 许可与来源 * **本发布内容(合成语音、清洗重写后的文本、元数据):** 采用 **Apache-2.0** 许可协议 * **上游文本数据源:** 若您复用或对比原始文本,请遵循其对应的许可协议与使用条款。 --- ## 与EchoX及相关资源的关联 本数据集覆盖了EchoX模型S2T任务训练数据的核心部分。 - 代码仓库:[https://github.com/FreedomIntelligence/EchoX](https://github.com/FreedomIntelligence/EchoX) - 论文(摘要页):[https://arxiv.org/abs/2509.09174](https://arxiv.org/abs/2509.09174) • 论文(PDF版):[http://arxiv.org/pdf/2509.09174](http://arxiv.org/pdf/2509.09174) - 模型: - EchoX-8B — [https://huggingface.co/FreedomIntelligence/EchoX-8B](https://huggingface.co/FreedomIntelligence/EchoX-8B) - EchoX-3B — [https://huggingface.co/FreedomIntelligence/EchoX-3B](https://huggingface.co/FreedomIntelligence/EchoX-3B) - 演示空间:[https://huggingface.co/spaces/FreedomIntelligence/EchoX](https://huggingface.co/spaces/FreedomIntelligence/EchoX) - 扩展数据集(**语音转语音+语音转文本**):[https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus](https://huggingface.co/datasets/KurtDu/EchoX-Dialogues-Plus) --- ## 引用 若本数据集对您的研究有所帮助,请引用EchoX相关论文: bibtex @misc{zhang2025echoxmitigatingacousticsemanticgap, title = {EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs}, author = {Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li}, year = {2025}, eprint = {2509.09174}, archivePrefix= {arXiv}, primaryClass = {cs.CL}, url = {https://arxiv.org/abs/2509.09174} }
提供机构:
maas
创建时间:
2025-09-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作