qmeeus/vp-er-14l
收藏Hugging Face2024-02-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/qmeeus/vp-er-14l
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: multilang
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 9267861770.0
num_examples: 28000
download_size: 5714236793
dataset_size: 9267861770.0
- config_name: transcribe_cs
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 3967671467.0
num_examples: 12000
download_size: 3962625704
dataset_size: 3967671467.0
- config_name: transcribe_de
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 3496757286.0
num_examples: 12000
download_size: 3486791342
dataset_size: 3496757286.0
- config_name: transcribe_en
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 4000348474.0
num_examples: 12000
download_size: 3984271576
dataset_size: 4000348474.0
- config_name: transcribe_es
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 354422163.0
num_examples: 1000
download_size: 353400896
dataset_size: 354422163.0
- config_name: transcribe_fi
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 332239755.0
num_examples: 1000
download_size: 331416051
dataset_size: 332239755.0
- config_name: transcribe_fr
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 328771385.0
num_examples: 1000
download_size: 328033802
dataset_size: 328771385.0
- config_name: transcribe_hr
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 365306344.0
num_examples: 1000
download_size: 364635100
dataset_size: 365306344.0
- config_name: transcribe_hu
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 341476551.0
num_examples: 1000
download_size: 341060381
dataset_size: 341476551.0
- config_name: transcribe_it
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 387784161.0
num_examples: 1000
download_size: 386989800
dataset_size: 387784161.0
- config_name: transcribe_nl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 248989364.0
num_examples: 1000
download_size: 248410994
dataset_size: 248989364.0
- config_name: transcribe_pl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 321050950.0
num_examples: 1000
download_size: 320461314
dataset_size: 321050950.0
- config_name: transcribe_ro
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 357399352.0
num_examples: 1000
download_size: 356905704
dataset_size: 357399352.0
- config_name: transcribe_sk
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 327014013.0
num_examples: 1000
download_size: 326201959
dataset_size: 327014013.0
- config_name: transcribe_sl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 327488868.0
num_examples: 1000
download_size: 326257400
dataset_size: 327488868.0
- config_name: translate_cs
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 3966469835.0
num_examples: 12000
download_size: 3961047126
dataset_size: 3966469835.0
- config_name: translate_de
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 3495589100.0
num_examples: 12000
download_size: 3485953109
dataset_size: 3495589100.0
- config_name: translate_en
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 4000335785.0
num_examples: 12000
download_size: 3984268799
dataset_size: 4000335785.0
- config_name: translate_es
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 354059137.0
num_examples: 1000
download_size: 353045005
dataset_size: 354059137.0
- config_name: translate_fi
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 331638149.0
num_examples: 1000
download_size: 330815338
dataset_size: 331638149.0
- config_name: translate_fr
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 328745309.0
num_examples: 1000
download_size: 328018414
dataset_size: 328745309.0
- config_name: translate_hr
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 365396353.0
num_examples: 1000
download_size: 364705891
dataset_size: 365396353.0
- config_name: translate_hu
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 342234406.0
num_examples: 1000
download_size: 341838112
dataset_size: 342234406.0
- config_name: translate_it
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 387817353.0
num_examples: 1000
download_size: 387040742
dataset_size: 387817353.0
- config_name: translate_nl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 248977383.0
num_examples: 1000
download_size: 248401676
dataset_size: 248977383.0
- config_name: translate_pl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 321035993.0
num_examples: 1000
download_size: 320443680
dataset_size: 321035993.0
- config_name: translate_ro
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 357484418.0
num_examples: 1000
download_size: 356976503
dataset_size: 357484418.0
- config_name: translate_sk
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 326989680.0
num_examples: 1000
download_size: 326138608
dataset_size: 326989680.0
- config_name: translate_sl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: task
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 327012282.0
num_examples: 1000
download_size: 325739820
dataset_size: 327012282.0
configs:
- config_name: multilang
data_files:
- split: train
path: multilang/train-*
- config_name: transcribe_cs
data_files:
- split: train
path: transcribe_cs/train-*
- config_name: transcribe_de
data_files:
- split: train
path: transcribe_de/train-*
- config_name: transcribe_en
data_files:
- split: train
path: transcribe_en/train-*
- config_name: transcribe_es
data_files:
- split: train
path: transcribe_es/train-*
- config_name: transcribe_fi
data_files:
- split: train
path: transcribe_fi/train-*
- config_name: transcribe_fr
data_files:
- split: train
path: transcribe_fr/train-*
- config_name: transcribe_hr
data_files:
- split: train
path: transcribe_hr/train-*
- config_name: transcribe_hu
data_files:
- split: train
path: transcribe_hu/train-*
- config_name: transcribe_it
data_files:
- split: train
path: transcribe_it/train-*
- config_name: transcribe_nl
data_files:
- split: train
path: transcribe_nl/train-*
- config_name: transcribe_pl
data_files:
- split: train
path: transcribe_pl/train-*
- config_name: transcribe_ro
data_files:
- split: train
path: transcribe_ro/train-*
- config_name: transcribe_sk
data_files:
- split: train
path: transcribe_sk/train-*
- config_name: transcribe_sl
data_files:
- split: train
path: transcribe_sl/train-*
- config_name: translate_cs
data_files:
- split: train
path: translate_cs/train-*
- config_name: translate_de
data_files:
- split: train
path: translate_de/train-*
- config_name: translate_en
data_files:
- split: train
path: translate_en/train-*
- config_name: translate_es
data_files:
- split: train
path: translate_es/train-*
- config_name: translate_fi
data_files:
- split: train
path: translate_fi/train-*
- config_name: translate_fr
data_files:
- split: train
path: translate_fr/train-*
- config_name: translate_hr
data_files:
- split: train
path: translate_hr/train-*
- config_name: translate_hu
data_files:
- split: train
path: translate_hu/train-*
- config_name: translate_it
data_files:
- split: train
path: translate_it/train-*
- config_name: translate_nl
data_files:
- split: train
path: translate_nl/train-*
- config_name: translate_pl
data_files:
- split: train
path: translate_pl/train-*
- config_name: translate_ro
data_files:
- split: train
path: translate_ro/train-*
- config_name: translate_sk
data_files:
- split: train
path: translate_sk/train-*
- config_name: translate_sl
data_files:
- split: train
path: translate_sl/train-*
---
# Dataset Card for "vp-er-14l"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
qmeeus
原始信息汇总
数据集概述
数据集配置
多语言配置 (multilang)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 28000个样本,9267861770.0字节
- 下载大小: 5714236793字节
- 数据集大小: 9267861770.0字节
捷克语转写配置 (transcribe_cs)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 12000个样本,3967671467.0字节
- 下载大小: 3962625704字节
- 数据集大小: 3967671467.0字节
德语转写配置 (transcribe_de)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 12000个样本,3496757286.0字节
- 下载大小: 3486791342字节
- 数据集大小: 3496757286.0字节
英语转写配置 (transcribe_en)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 12000个样本,4000348474.0字节
- 下载大小: 3984271576字节
- 数据集大小: 4000348474.0字节
西班牙语转写配置 (transcribe_es)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,354422163.0字节
- 下载大小: 353400896字节
- 数据集大小: 354422163.0字节
芬兰语转写配置 (transcribe_fi)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,332239755.0字节
- 下载大小: 331416051字节
- 数据集大小: 332239755.0字节
法语转写配置 (transcribe_fr)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,328771385.0字节
- 下载大小: 328033802字节
- 数据集大小: 328771385.0字节
克罗地亚语转写配置 (transcribe_hr)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,365306344.0字节
- 下载大小: 364635100字节
- 数据集大小: 365306344.0字节
匈牙利语转写配置 (transcribe_hu)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,341476551.0字节
- 下载大小: 341060381字节
- 数据集大小: 341476551.0字节
意大利语转写配置 (transcribe_it)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,387784161.0字节
- 下载大小: 386989800字节
- 数据集大小: 387784161.0字节
荷兰语转写配置 (transcribe_nl)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,248989364.0字节
- 下载大小: 248410994字节
- 数据集大小: 248989364.0字节
波兰语转写配置 (transcribe_pl)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,321050950.0字节
- 下载大小: 320461314字节
- 数据集大小: 321050950.0字节
罗马尼亚语转写配置 (transcribe_ro)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,357399352.0字节
- 下载大小: 356905704字节
- 数据集大小: 357399352.0字节
斯洛伐克语转写配置 (transcribe_sk)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,327014013.0字节
- 下载大小: 326201959字节
- 数据集大小: 327014013.0字节
斯洛文尼亚语转写配置 (transcribe_sl)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,327488868.0字节
- 下载大小: 326257400字节
- 数据集大小: 327488868.0字节
捷克语翻译配置 (translate_cs)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 12000个样本,3966469835.0字节
- 下载大小: 3961047126字节
- 数据集大小: 3966469835.0字节
德语翻译配置 (translate_de)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 12000个样本,3495589100.0字节
- 下载大小: 3485953109字节
- 数据集大小: 3495589100.0字节
英语翻译配置 (translate_en)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 12000个样本,4000335785.0字节
- 下载大小: 3984268799字节
- 数据集大小: 4000335785.0字节
西班牙语翻译配置 (translate_es)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,354059137.0字节
- 下载大小: 353045005字节
- 数据集大小: 354059137.0字节
芬兰语翻译配置 (translate_fi)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,331638149.0字节
- 下载大小: 330815338字节
- 数据集大小: 331638149.0字节
法语翻译配置 (translate_fr)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,328745309.0字节
- 下载大小: 328018414字节
- 数据集大小: 328745309.0字节
克罗地亚语翻译配置 (translate_hr)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,365396353.0字节
- 下载大小: 364705891字节
- 数据集大小: 365396353.0字节
匈牙利语翻译配置 (translate_hu)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,342234406.0字节
- 下载大小: 341838112字节
- 数据集大小: 342234406.0字节
意大利语翻译配置 (translate_it)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,387817353.0字节
- 下载大小: 387040742字节
- 数据集大小: 387817353.0字节
荷兰语翻译配置 (translate_nl)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,248977383.0字节
- 下载大小: 248401676字节
- 数据集大小: 248977383.0字节
波兰语翻译配置 (translate_pl)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,321035993.0字节
- 下载大小: 320443680字节
- 数据集大小: 321035993.0字节
罗马尼亚语翻译配置 (translate_ro)
- 特征:
audio_id: 字符串audio: 音频,采样率16000text: 字符串task: 字符串language: 字符串
- 分割:
train: 1000个样本,357484418.0字节
- 下载大小: 356976503字节
- 数据集大小: 357484418.0字节



