malaysia-ai/common_voice_22_0
收藏Hugging Face2025-09-29 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/malaysia-ai/common_voice_22_0
下载链接
链接失效反馈官方服务:
资源简介:
Common Voice Corpus 22.0是一个开源的语言数据集,包含多种语言环境下的语音数据,支持default和pseudospeaker两种配置。在default配置中,数据包含句子及其相关信息,如句子的领域、投票情况、年龄、性别、口音等。在pseudospeaker配置中,数据专注于说话人信息和地区。数据集分为训练集、验证集和测试集,可用于语音识别、语言模型训练等多种NLP任务。
Common Voice Corpus 22.0 is an open-source language dataset that includes voice data in various linguistic environments, supporting two configurations: default and pseudospeaker. In the default configuration, the data includes sentences and their related information, such as the sentence domain, voting status, age, gender, accents, etc. In the pseudospeaker configuration, the data focuses on speaker information and locale. The dataset is split into training, validation, and test sets, which can be used for speech recognition, language model training, and various other NLP tasks.
提供机构:
malaysia-ai



