Phonetically Rich Corpus for Brazilian Portuguese
收藏arXiv2024-02-09 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2402.05794v1
下载链接
链接失效反馈官方服务:
资源简介:
本研究构建了一个针对巴西葡萄牙语的语音丰富语料库,旨在解决低资源语言在语音技术应用中的挑战。数据集包含10000条精心挑选的句子,覆盖广泛的语音变异,通过特定的文本处理和句子选择算法确保语音的丰富性。创建过程中,采用了基于三音素分布的句子选择算法和新的音位分类方法,以增强语音模型的性能。该数据集适用于自动语音识别和文本到语音合成等应用,有助于提升低资源语言的语音技术。
This study develops a speech-rich corpus for Brazilian Portuguese to address the challenges faced by low-resource languages in speech technology applications. The corpus contains 10,000 carefully selected sentences that cover a wide range of phonetic variations, with the richness of the speech data ensured by dedicated text processing and sentence selection algorithms. During the corpus construction, a triphone distribution-based sentence selection algorithm and a novel phonemic classification method were adopted to improve the performance of speech models. This corpus is applicable to applications such as automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, and helps advance speech technology for low-resource languages.
提供机构:
Alana AI Research São Paulo, Brazil
创建时间:
2024-02-09



