Phonetically Rich Corpus for Brazilian Portuguese

Name: Phonetically Rich Corpus for Brazilian Portuguese
Creator: Alana AI Research São Paulo, Brazil
Published: 2024-02-09 00:36:11
License: 暂无描述

arXiv2024-02-09 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2402.05794v1

下载链接

链接失效反馈

官方服务：

资源简介：

本研究构建了一个针对巴西葡萄牙语的语音丰富语料库，旨在解决低资源语言在语音技术应用中的挑战。数据集包含10000条精心挑选的句子，覆盖广泛的语音变异，通过特定的文本处理和句子选择算法确保语音的丰富性。创建过程中，采用了基于三音素分布的句子选择算法和新的音位分类方法，以增强语音模型的性能。该数据集适用于自动语音识别和文本到语音合成等应用，有助于提升低资源语言的语音技术。

This study develops a speech-rich corpus for Brazilian Portuguese to address the challenges faced by low-resource languages in speech technology applications. The corpus contains 10,000 carefully selected sentences that cover a wide range of phonetic variations, with the richness of the speech data ensured by dedicated text processing and sentence selection algorithms. During the corpus construction, a triphone distribution-based sentence selection algorithm and a novel phonemic classification method were adopted to improve the performance of speech models. This corpus is applicable to applications such as automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, and helps advance speech technology for low-resource languages.

提供机构：

Alana AI Research São Paulo, Brazil

创建时间：

2024-02-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集