five

mort666/cv_corpus_v22

收藏
Hugging Face2025-12-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/mort666/cv_corpus_v22
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是Mozilla Common Voice Corpus 22的非官方版本,从项目网站https://commonvoice.mozilla.org/下载并转换而来。目前正在转换为parquet格式以便于使用。数据集包含多种语言,适用于自动语音识别任务。数据集特征包括client_id、path、音频(采样率为48000)、句子、赞成票、反对票、年龄、性别、口音、地区、片段和变体。数据集大小在100B到1T之间,包含多种语言配置和分割(训练、验证、测试、其他、无效)。

This dataset is an unofficial version of the Mozilla Common Voice Corpus 22, downloaded and converted from the projects website https://commonvoice.mozilla.org/. It is currently being converted to parquet format for convenience. The dataset includes a wide range of languages and is designed for automatic speech recognition tasks. The datasets features include client_id, path, audio (with a sampling rate of 48000), sentence, up_votes, down_votes, age, gender, accent, locale, segment, and variant. The dataset is categorized by size as between 100B and 1T, and it includes multiple configurations for different languages and splits (train, validation, test, other, invalidated).
提供机构:
mort666
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作