Speech Datasets for Few-Shot Speaker Adaptation
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/CorentinJ/Real-Time-Voice-Cloning
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了来自男女说话者的录音,旨在用于微调预训练的多说话者Grad-TTS模型,以实现说话者适配。每位说话者的数据集包括16条清晰录音和2条含有其他声音额外记录的受损数据集。规模上,共有10位女性和4位男性说话者,每位说话者都构建了多个数据集,用于语音合成和说话者相似度评估任务。
This dataset comprises audio recordings from female and male speakers, designed for fine-tuning pre-trained multi-speaker Grad-TTS models to enable speaker adaptation. Each speaker's dataset consists of 16 clear recordings and 2 impaired datasets with additional extraneous sounds. In total, there are 10 female and 4 male speakers, and multiple datasets are constructed for each speaker, which are utilized for speech synthesis and speaker similarity evaluation tasks.
提供机构:
Custom recordings



