MushanW/GLOBE_V2
收藏Hugging Face2024-11-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/MushanW/GLOBE_V2
下载链接
链接失效反馈官方服务:
资源简介:
GLOBE是一个高质量的英语语料库,包含全球各地的口音,专门设计用于解决当前零样本说话者自适应文本到语音(TTS)系统在适应有口音的说话者时表现不佳的问题。与常用的英语语料库(如LibriTTS和VCTK)相比,GLOBE独特之处在于其包含了来自23,519名说话者的语音,覆盖了全球164种口音,并提供了这些说话者的详细元数据。与原始语料库(即Common Voice)相比,GLOBE通过严格的过滤和增强过程显著提高了语音数据的质量,并填补了所有缺失的说话者元数据。最终整理的GLOBE语料库包含535小时的语音数据,采样率为24 kHz。基准测试结果表明,使用GLOBE语料库训练的说话者自适应TTS模型在合成语音时,其说话者相似性和自然度均优于使用其他流行语料库训练的模型。
GLOBE is a high-quality English corpus with worldwide accents, specifically designed to address the limitations of current zero-shot speaker adaptive Text-to-Speech (TTS) systems that exhibit poor generalizability in adapting to speakers with accents. Compared to commonly used English corpora, such as LibriTTS and VCTK, GLOBE is unique in its inclusion of utterances from 23,519 speakers and covers 164 accents worldwide, along with detailed metadata for these speakers. Compared to its original corpus, i.e., Common Voice, GLOBE significantly improves the quality of the speech data through rigorous filtering and enhancement processes, while also populating all missing speaker metadata. The final curated GLOBE corpus includes 535 hours of speech data at a 24 kHz sampling rate. Our benchmark results indicate that the speaker adaptive TTS model trained on the GLOBE corpus can synthesize speech with better speaker similarity and comparable naturalness than that trained on other popular corpora.
提供机构:
MushanW



