CapSpeech-CommonVoice

Name: CapSpeech-CommonVoice
Creator: maas
Published: 2025-12-05 11:49:22
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-30 收录

下载链接：

https://modelscope.cn/datasets/OpenSound/CapSpeech-CommonVoice

下载链接

链接失效反馈

官方服务：

资源简介：

## CapSpeech-CommonVoice Audio DataSet used for the paper: ***CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech*** Please refer to 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) for the whole dataset and 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) repo for more details. ## Overview 🔥 CapSpeech is a new benchmark designed for style-captioned TTS (**CapTTS**) tasks, including style-captioned text-to-speech synthesis with sound effects (**CapTTS-SE**), accent-captioned TTS (**AccCapTTS**), emotion-captioned TTS (**EmoCapTTS**) and text-to-speech synthesis for chat agent (**AgentTTS**). CapSpeech comprises over **10 million machine-annotated** audio-caption pairs and nearly **0.36 million human-annotated** audio-caption pairs. **3 new speech datasets** are specifically designed for the CapTTS-SE and AgentTTS tasks to enhance the benchmark’s coverage of real-world scenarios. ![Overview](https://raw.githubusercontent.com/WangHelin1997/CapSpeech-demo/main/static/images/present.jpg) ## License ⚠️ All resources are under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. ## Citation If you use this dataset, the models or the repository, please cite our work as follows: ```bibtex @misc{wang2025capspeechenablingdownstreamapplications, title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech}, author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak}, year={2025}, eprint={2506.02863}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.02863}, } ```

CapSpeech-CommonVoice 音频数据集本数据集用于论文《CapSpeech：赋能风格标注文本转语音的下游应用》。完整数据集请访问🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech)，更多细节可查阅🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech)开源代码仓库。 ## 数据集概览 🔥 CapSpeech是专为风格标注文本转语音（Style-Captioned Text-to-Speech，简称CapTTS）任务打造的全新基准数据集，涵盖四大任务方向：带音效的风格标注文本转语音合成（CapTTS-SE）、口音标注文本转语音（AccCapTTS）、情感标注文本转语音（EmoCapTTS）以及聊天AI智能体（AI Agent）专用文本转语音合成（AgentTTS）。该数据集包含超1000万条机器标注的音频-文本配对样本，以及近36万条人工标注的音频-文本配对样本。为提升该基准数据集对真实应用场景的覆盖能力，团队还专为CapTTS-SE与AgentTTS任务构建了3个全新语音数据集。 ![数据集概览](https://raw.githubusercontent.com/WangHelin1997/CapSpeech-demo/main/static/images/present.jpg) ## 许可证 ⚠️ 所有资源均遵循[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)许可协议。 ## 引用格式若您使用本数据集、相关模型或代码仓库，请按如下格式引用本研究： bibtex @misc{wang2025capspeechenablingdownstreamapplications, title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech}, author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak}, year={2025}, eprint={2506.02863}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.02863}, }

提供机构：

maas

创建时间：

2025-08-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集