five

parandl/common_voice_16_1_hi_pseudo_labelled

收藏
Hugging Face2024-11-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/parandl/common_voice_16_1_hi_pseudo_labelled
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含两种配置:英语(en)和印地语(hi)。每种配置包含以下特征:路径(path)、音频(audio,采样率为16000Hz)、句子(sentence)、条件序列(condition_on_prev)和Whisper转录(whisper_transcript)。数据集分为训练集(train)、验证集(validation)和测试集(test),每个分割的字节大小和示例数量也被详细列出。英语配置的训练集包含230,869个示例,验证集包含3,680个示例,测试集包含3,653个示例。印地语配置的训练集包含730个示例,验证集包含414个示例,测试集包含584个示例。数据集的总下载大小和总大小也被列出。

The dataset contains two configurations: English (en) and Hindi (hi). Each configuration includes the following features: path, audio (with a sampling rate of 16000Hz), sentence, condition_on_prev sequence, and whisper_transcript. The dataset is divided into train, validation, and test splits, with the byte size and number of examples for each split detailed. The English configurations train set contains 230,869 examples, the validation set contains 3,680 examples, and the test set contains 3,653 examples. The Hindi configurations train set contains 730 examples, the validation set contains 414 examples, and the test set contains 584 examples. The total download size and dataset size are also listed.
提供机构:
parandl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作