CapSpeech-PT-SEDB-Audio
收藏魔搭社区2025-10-09 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/OpenSound/CapSpeech-PT-SEDB-Audio
下载链接
链接失效反馈官方服务:
资源简介:
## CapSpeech-PT-SEDB Audio
DataSet used for the paper: ***CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech***
Please refer to 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) for the whole dataset and 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) repo for more details.
## Overview
🔥 CapSpeech is a new benchmark designed for style-captioned TTS (**CapTTS**) tasks, including style-captioned text-to-speech synthesis with sound effects (**CapTTS-SE**), accent-captioned TTS (**AccCapTTS**), emotion-captioned TTS (**EmoCapTTS**) and text-to-speech synthesis for chat agent (**AgentTTS**).
CapSpeech comprises over **10 million machine-annotated** audio-caption pairs and nearly **0.36 million human-annotated** audio-caption pairs. **3 new speech datasets** are specifically designed for the CapTTS-SE and AgentTTS tasks to enhance the benchmark’s coverage of real-world scenarios.

## License
⚠️ All resources are under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
## Citation
If you use this dataset, the models or the repository, please cite our work as follows:
```bibtex
@misc{wang2025capspeechenablingdownstreamapplications,
title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech},
author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak},
year={2025},
eprint={2506.02863},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2506.02863},
}
```
## Usage
Please RUN:
```
# module load git-lfs
tar -xvzf test-other.tar.gz
tar -xvzf test-clean.tar.gz
tar -xvzf dev-other.tar.gz
tar -xvzf dev-clean.tar.gz
for file in train-clean-100_part_*.tar.gz; do
echo "Extracting $file..."
tar -xvzf "$file"
done
for file in train-clean-360_part_*.tar.gz; do
echo "Extracting $file..."
tar -xvzf "$file"
done
for file in train-other-500_part_*.tar.gz; do
echo "Extracting $file..."
tar -xvzf "$file"
done
```
# CapSpeech-PT-SEDB 音频数据集
本数据集用于论文《CapSpeech:赋能风格标注文本转语音的下游应用》。
请访问 🤗[CapSpeech](https://huggingface.co/datasets/OpenSound/CapSpeech) 获取完整数据集,以及 🚀[CapSpeech](https://github.com/WangHelin1997/CapSpeech) 代码仓库以了解更多细节。
## 概述
🔥 CapSpeech 是专为风格标注文本转语音(Style-Captioned Text-to-Speech,下文简称CapTTS)任务打造的全新基准数据集,涵盖带音效的风格标注文本转语音合成(CapTTS-SE)、口音标注文本转语音(AccCapTTS)、情感标注文本转语音(EmoCapTTS)以及对话AI智能体(AI Agent)文本转语音合成(AgentTTS)四大任务方向。
CapSpeech 包含超1000万条机器标注的音频-字幕对,以及近36万条人工标注的音频-字幕对。本次针对CapTTS-SE与AgentTTS任务专门构建了3个全新语音数据集,以提升该基准数据集对真实应用场景的覆盖能力。

## 许可协议
⚠️ 所有资源均遵循 [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) 许可协议。
## 引用
若您使用本数据集、相关模型或代码仓库,请按以下格式引用本研究:
bibtex
@misc{wang2025capspeechenablingdownstreamapplications,
title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech},
author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak},
year={2025},
eprint={2506.02863},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2506.02863},
}
## 使用方法
请执行以下命令:
# module load git-lfs
tar -xvzf test-other.tar.gz
tar -xvzf test-clean.tar.gz
tar -xvzf dev-other.tar.gz
tar -xvzf dev-clean.tar.gz
for file in train-clean-100_part_*.tar.gz; do
echo "Extracting $file..."
tar -xvzf "$file"
done
for file in train-clean-360_part_*.tar.gz; do
echo "Extracting $file..."
tar -xvzf "$file"
done
for file in train-other-500_part_*.tar.gz; do
echo "Extracting $file..."
tar -xvzf "$file"
done
提供机构:
maas
创建时间:
2025-08-26



