CapSpeech_GigaSpeech

Name: CapSpeech_GigaSpeech
Creator: maas
Published: 2025-12-05 11:49:22
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-30 收录

下载链接：

https://modelscope.cn/datasets/OpenSound/CapSpeech_GigaSpeech

下载链接

链接失效反馈

官方服务：

资源简介：

## CapSpeech-GigaSpeech Audio DataSet used for the paper: ***CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech*** Please refer to 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) for the whole dataset and 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) repo for more details. ## Overview 🔥 CapSpeech is a new benchmark designed for style-captioned TTS (**CapTTS**) tasks, including style-captioned text-to-speech synthesis with sound effects (**CapTTS-SE**), accent-captioned TTS (**AccCapTTS**), emotion-captioned TTS (**EmoCapTTS**) and text-to-speech synthesis for chat agent (**AgentTTS**). CapSpeech comprises over **10 million machine-annotated** audio-caption pairs and nearly **0.36 million human-annotated** audio-caption pairs. **3 new speech datasets** are specifically designed for the CapTTS-SE and AgentTTS tasks to enhance the benchmark’s coverage of real-world scenarios. ![Overview](https://raw.githubusercontent.com/WangHelin1997/CapSpeech-demo/main/static/images/present.jpg) ## License ⚠️ All resources are under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. ## Citation If you use this dataset, the models or the repository, please cite our work as follows: ```bibtex @misc{wang2025capspeechenablingdownstreamapplications, title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech}, author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak}, year={2025}, eprint={2506.02863}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.02863}, } ```

# CapSpeech-GigaSpeech 音频数据集本数据集配套论文为：***《CapSpeech：赋能风格标注文本转语音的下游应用》***（原论文标题：CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech）。完整数据集可访问 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech)，更多细节请查阅 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) 代码仓库。 ## 数据集概览 🔥 CapSpeech 是专为风格标注文本转语音（Style-Captioned Text-to-Speech，简称CapTTS）任务打造的全新基准数据集，涵盖带音效的风格标注文本转语音合成（CapTTS-SE）、口音标注文本转语音（AccCapTTS）、情感标注文本转语音（EmoCapTTS）以及面向AI智能体（AI Agent）的文本转语音合成（AgentTTS）。 CapSpeech 包含超过1000万条机器标注的音频-字幕对，以及近36万条人工标注的音频-字幕对。本次基准还专为CapTTS-SE与AgentTTS任务构建了3个全新的语音数据集，以提升该基准对真实场景的覆盖范围。 ![数据集概览](https://raw.githubusercontent.com/WangHelin1997/CapSpeech-demo/main/static/images/present.jpg) ## 许可协议 ⚠️ 所有资源均遵循 [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) 许可协议。 ## 引用格式若您使用本数据集、相关模型或代码仓库，请按以下格式引用我们的工作： bibtex @misc{wang2025capspeechenablingdownstreamapplications, title={"CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech"}, author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak}, year={2025}, eprint={2506.02863}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.02863}, }

提供机构：

maas

创建时间：

2025-08-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集