five

malaysia-ai/common_voice_22_0

收藏
Hugging Face2025-09-29 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/malaysia-ai/common_voice_22_0
下载链接
链接失效反馈
官方服务:
资源简介:
Common Voice Corpus 22.0是一个开源的语言数据集,包含多种语言环境下的语音数据,支持default和pseudospeaker两种配置。在default配置中,数据包含句子及其相关信息,如句子的领域、投票情况、年龄、性别、口音等。在pseudospeaker配置中,数据专注于说话人信息和地区。数据集分为训练集、验证集和测试集,可用于语音识别、语言模型训练等多种NLP任务。

Common Voice Corpus 22.0 is an open-source language dataset that includes voice data in various linguistic environments, supporting two configurations: default and pseudospeaker. In the default configuration, the data includes sentences and their related information, such as the sentence domain, voting status, age, gender, accents, etc. In the pseudospeaker configuration, the data focuses on speaker information and locale. The dataset is split into training, validation, and test sets, which can be used for speech recognition, language model training, and various other NLP tasks.
提供机构:
malaysia-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作