CapSpeech-PT-SEDB-HQ-Audio

Name: CapSpeech-PT-SEDB-HQ-Audio
Creator: maas
Published: 2025-10-04 16:46:39
License: 暂无描述

魔搭社区2025-10-04 更新2025-08-30 收录

下载链接：

https://modelscope.cn/datasets/OpenSound/CapSpeech-PT-SEDB-HQ-Audio

下载链接

链接失效反馈

官方服务：

资源简介：

## CapSpeech-PT-SEDB-HQ Audio DataSet used for the paper: ***CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech*** Please refer to 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) for the whole dataset and 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) repo for more details. ## Overview 🔥 CapSpeech is a new benchmark designed for style-captioned TTS (**CapTTS**) tasks, including style-captioned text-to-speech synthesis with sound effects (**CapTTS-SE**), accent-captioned TTS (**AccCapTTS**), emotion-captioned TTS (**EmoCapTTS**) and text-to-speech synthesis for chat agent (**AgentTTS**). CapSpeech comprises over **10 million machine-annotated** audio-caption pairs and nearly **0.36 million human-annotated** audio-caption pairs. **3 new speech datasets** are specifically designed for the CapTTS-SE and AgentTTS tasks to enhance the benchmark’s coverage of real-world scenarios. ![Overview](https://raw.githubusercontent.com/WangHelin1997/CapSpeech-demo/main/static/images/present.jpg) ## License ⚠️ All resources are under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. ## Citation If you use this dataset, the models or the repository, please cite our work as follows: ```bibtex @misc{wang2025capspeechenablingdownstreamapplications, title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech}, author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak}, year={2025}, eprint={2506.02863}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.02863}, } ```

CapSpeech-PT-SEDB-HQ 音频数据集本数据集对应论文：《CapSpeech：赋能风格标注文本转语音的下游应用》完整数据集可通过 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) 获取，更多细节请查阅 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) 代码仓库。 ## 数据集概览 🔥 CapSpeech 是专为风格标注文本转语音（Style-Captioned Text-to-Speech，简称CapTTS）任务打造的全新基准数据集，涵盖以下四类子任务：带音效的风格标注文本转语音合成（CapTTS with sound effects, CapTTS-SE）、口音标注文本转语音（Accent-Captioned TTS, AccCapTTS）、情感标注文本转语音（Emotion-Captioned TTS, EmoCapTTS）以及智能体对话文本转语音（Chat Agent Text-to-Speech, AgentTTS）。 CapSpeech 包含超过**1000万条机器标注**的音频-文本对，以及近**36万条人工标注**的音频-文本对。为提升本基准数据集对真实应用场景的覆盖范围，我们专门为CapTTS-SE与AgentTTS任务构建了**3个全新语音数据集**。 ![数据集概览](https://raw.githubusercontent.com/WangHelin1997/CapSpeech-demo/main/static/images/present.jpg) ## 许可协议 ⚠️ 所有资源均遵循 [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) 许可协议。 ## 引用格式若您使用本数据集、相关模型或代码仓库，请按以下格式引用本工作： bibtex @misc{wang2025capspeechenablingdownstreamapplications, title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech}, author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak}, year={2025}, eprint={2506.02863}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.02863}, }

提供机构：

maas

创建时间：

2025-08-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集