CapTTS-SFT-Audio
收藏魔搭社区2025-09-16 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/OpenSound/CapTTS-SFT-Audio
下载链接
链接失效反馈官方服务:
资源简介:
## CapTTS-SFT Audio
DataSet used for the paper: ***CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech***
Please refer to 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) for the whole dataset and 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) repo for more details.
## Overview
🔥 CapSpeech is a new benchmark designed for style-captioned TTS (**CapTTS**) tasks, including style-captioned text-to-speech synthesis with sound effects (**CapTTS-SE**), accent-captioned TTS (**AccCapTTS**), emotion-captioned TTS (**EmoCapTTS**) and text-to-speech synthesis for chat agent (**AgentTTS**).
CapSpeech comprises over **10 million machine-annotated** audio-caption pairs and nearly **0.36 million human-annotated** audio-caption pairs. **3 new speech datasets** are specifically designed for the CapTTS-SE and AgentTTS tasks to enhance the benchmark’s coverage of real-world scenarios.

## License
⚠️ All resources are under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
## Citation
If you use this dataset, the models or the repository, please cite our work as follows:
```bibtex
@misc{wang2025capspeechenablingdownstreamapplications,
title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech},
author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak},
year={2025},
eprint={2506.02863},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2506.02863},
}
```
## CapTTS-SFT 音频数据集
本数据集配套于论文:**《CapSpeech:赋能风格标注式文本到语音的下游应用》**
完整数据集与更多细节可参阅 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) 数据集仓库,以及 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) 项目代码库。
## 概述
🔥 CapSpeech 是一款专为风格标注式文本到语音(Style-Captioned Text-to-Speech,CapTTS)任务设计的新型基准数据集,涵盖四大任务方向:带音效的风格标注式文本到语音合成(CapTTS-SE)、带口音标注的文本到语音合成(AccCapTTS)、带情感标注的文本到语音合成(EmoCapTTS),以及面向AI智能体(AI Agent)的文本到语音合成(AgentTTS)。
CapSpeech 总计包含超过**1000万条机器标注**的音频-标注对,以及近**36万条人工标注**的音频-标注对。为提升基准数据集对真实应用场景的覆盖范围,本数据集还专门为CapTTS-SE与AgentTTS任务构建了3个全新的语音数据集。

## 许可
⚠️ 本数据集所有资源均遵循 [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) 开源许可协议。
## 引用
若您使用本数据集、相关模型或项目代码库,请按以下格式引用本研究:
bibtex
@misc{wang2025capspeechenablingdownstreamapplications,
title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech},
author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak},
year={2025},
eprint={2506.02863},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2506.02863},
}
提供机构:
maas
创建时间:
2025-08-26



