mfidabel/common_voice_16_1_semisupervised
收藏Hugging Face2024-03-25 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/mfidabel/common_voice_16_1_semisupervised
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: client_id
dtype: string
- name: path
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 48000
- name: sentence
dtype: string
- name: up_votes
dtype: int64
- name: down_votes
dtype: int64
- name: age
dtype: string
- name: gender
dtype: string
- name: accent
dtype: string
- name: locale
dtype: string
- name: segment
dtype: string
- name: variant
dtype: string
- name: predicted_sentence
dtype: string
splits:
- name: whisper.medium
num_bytes: 8271758359.625
num_examples: 18779
- name: whisper.small
num_bytes: 2763174559.625
num_examples: 18779
- name: whisper.large.v3
num_bytes: 8271776000.625
num_examples: 18779
- name: whisper.tiny
num_bytes: 8271831240.625
num_examples: 18779
download_size: 23466393916
dataset_size: 27578540160.5
configs:
- config_name: default
data_files:
- split: whisper.medium
path: data/whisper.medium-*
- split: whisper.small
path: data/whisper.small-*
- split: whisper.large.v3
path: data/whisper.large.v3-*
- split: whisper.tiny
path: data/whisper.tiny-*
---
提供机构:
mfidabel
原始信息汇总
数据集概述
数据特征
- client_id: 字符串类型
- path: 字符串类型
- audio: 音频类型,采样率为48000
- sentence: 字符串类型
- up_votes: 64位整数类型
- down_votes: 64位整数类型
- age: 字符串类型
- gender: 字符串类型
- accent: 字符串类型
- locale: 字符串类型
- segment: 字符串类型
- variant: 字符串类型
- predicted_sentence: 字符串类型
数据分割
- whisper.medium: 字节数为8271758359.625,样本数为18779
- whisper.small: 字节数为2763174559.625,样本数为18779
- whisper.large.v3: 字节数为8271776000.625,样本数为18779
- whisper.tiny: 字节数为8271831240.625,样本数为18779
数据大小
- 下载大小: 23466393916字节
- 数据集大小: 27578540160.5字节
配置
- default:
- whisper.medium: 路径为
data/whisper.medium-* - whisper.small: 路径为
data/whisper.small-* - whisper.large.v3: 路径为
data/whisper.large.v3-* - whisper.tiny: 路径为
data/whisper.tiny-*
- whisper.medium: 路径为



