P4 语音语料库
收藏arXiv2022-05-20 更新2024-06-21 收录
下载链接:
https://huggingface.co/KBLab
下载链接
链接失效反馈官方服务:
资源简介:
P4语音语料库是由瑞典国家图书馆创建的一个大型语音数据集,包含超过270万小时的录音,主要来源于瑞典地方公共广播。该数据集旨在通过广泛采样和平衡地区方言,创建一个更具代表性和民主性的模型。创建过程中,数据从广播、播客和有声书中提取,并通过自动方法进行预处理。该数据集主要应用于自动语音识别领域,旨在解决小语种语言资源不足的问题,提高语音识别技术的性能。
The P4 Speech Corpus is a large-scale speech dataset created by the National Library of Sweden, containing over 2.7 million hours of recordings primarily sourced from Swedish local public broadcasters. This dataset aims to create a more representative and inclusive model through extensive sampling and balanced regional dialect coverage. During its development, data was extracted from broadcasts, podcasts and audiobooks, and preprocessed via automated methods. Primarily applied in the field of automatic speech recognition (ASR), this dataset is designed to address the shortage of language resources for low-resource languages and improve the performance of speech recognition technologies.
提供机构:
瑞典国家图书馆
创建时间:
2022-05-06



