qmeeus/vp-er-10l
收藏Hugging Face2024-03-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/qmeeus/vp-er-10l
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: cs
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
splits:
- name: train
num_bytes: 3968868756
num_examples: 12000
download_size: 3963196917
dataset_size: 3968868756
- config_name: de
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 3498200501
num_examples: 12000
download_size: 3487997831
dataset_size: 3498200501
- config_name: en
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 4000276474
num_examples: 12000
download_size: 3984332876
dataset_size: 4000276474
- config_name: es
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 4138004589
num_examples: 12000
download_size: 4128702065
dataset_size: 4138004589
- config_name: fr
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 3915210199
num_examples: 12000
download_size: 3906302179
dataset_size: 3915210199
- config_name: hu
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 4174219387
num_examples: 12000
download_size: 4167484051
dataset_size: 4174219387
- config_name: it
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 4732854879
num_examples: 12000
download_size: 4722455587
dataset_size: 4732854879
- config_name: nl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 3162694343
num_examples: 12000
download_size: 3154090731
dataset_size: 3162694343
- config_name: pl
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 4041042730
num_examples: 12000
download_size: 4033450852
dataset_size: 4041042730
- config_name: ro
features:
- name: audio_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: language
dtype: string
- name: transcription
dtype: string
- name: translation
dtype: string
- name: wer
dtype: float32
splits:
- name: train
num_bytes: 4341972777
num_examples: 12000
download_size: 4334737748
dataset_size: 4341972777
configs:
- config_name: cs
data_files:
- split: train
path: cs/train-*
- config_name: de
data_files:
- split: train
path: de/train-*
- config_name: en
data_files:
- split: train
path: en/train-*
- config_name: es
data_files:
- split: train
path: es/train-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- config_name: hu
data_files:
- split: train
path: hu/train-*
- config_name: it
data_files:
- split: train
path: it/train-*
- config_name: nl
data_files:
- split: train
path: nl/train-*
- config_name: pl
data_files:
- split: train
path: pl/train-*
- config_name: ro
data_files:
- split: train
path: ro/train-*
language:
- cs
- de
- en
- es
- fr
- hu
- it
- nl
- pl
- ro
tags:
- speech-to-text
- speech-translation
- automatic-speech-recognition
- language-detection
---
# Dataset Card for "vp-er-10l"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
dataset_info:
- config_name: cs
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
splits:
- name: train
字节数: 3968868756
样本数: 12000
下载大小: 3963196917
数据集大小: 3968868756
- config_name: de
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 3498200501
样本数: 12000
下载大小: 3487997831
数据集大小: 3498200501
- config_name: en
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 4000276474
样本数: 12000
下载大小: 3984332876
数据集大小: 4000276474
- config_name: es
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 4138004589
样本数: 12000
下载大小: 4128702065
数据集大小: 4138004589
- config_name: fr
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 3915210199
样本数: 12000
下载大小: 3906302179
数据集大小: 3915210199
- config_name: hu
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 4174219387
样本数: 12000
下载大小: 4167484051
数据集大小: 4174219387
- config_name: it
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 4732854879
样本数: 12000
下载大小: 4722455587
数据集大小: 4732854879
- config_name: nl
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 3162694343
样本数: 12000
下载大小: 3154090731
数据集大小: 3162694343
- config_name: pl
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 4041042730
样本数: 12000
下载大小: 4033450852
数据集大小: 4041042730
- config_name: ro
features:
- name: audio_id
dtype: 字符串
- name: audio
dtype:
音频:
采样率: 16000
- name: language
dtype: 字符串
- name: transcription
dtype: 字符串
- name: translation
dtype: 字符串
- name: 词错误率(Word Error Rate)
dtype: float32
splits:
- name: train
字节数: 4341972777
样本数: 12000
下载大小: 4334737748
数据集大小: 4341972777
configs:
- config_name: cs
data_files:
- split: train
path: cs/train-*
- config_name: de
data_files:
- split: train
path: de/train-*
- config_name: en
data_files:
- split: train
path: en/train-*
- config_name: es
data_files:
- split: train
path: es/train-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- config_name: hu
data_files:
- split: train
path: hu/train-*
- config_name: it
data_files:
- split: train
path: it/train-*
- config_name: nl
data_files:
- split: train
path: nl/train-*
- config_name: pl
data_files:
- split: train
path: pl/train-*
- config_name: ro
data_files:
- split: train
path: ro/train-*
language:
- cs(捷克语)
- de(德语)
- en(英语)
- es(西班牙语)
- fr(法语)
- hu(匈牙利语)
- it(意大利语)
- nl(荷兰语)
- pl(波兰语)
- ro(罗马尼亚语)
tags:
- 语音转文本(Speech-to-Text)
- 语音翻译(Speech Translation)
- 自动语音识别(Automatic Speech Recognition)
- 语言检测(Language Detection)
# vp-er-10l数据集卡片
[需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
qmeeus
原始信息汇总
数据集概述
数据集配置
配置名称:cs
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串
- 分割:
train: 3968868756字节, 12000个样本
- 下载大小: 3963196917字节
- 数据集大小: 3968868756字节
配置名称:de
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 3498200501字节, 12000个样本
- 下载大小: 3487997831字节
- 数据集大小: 3498200501字节
配置名称:en
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 4000276474字节, 12000个样本
- 下载大小: 3984332876字节
- 数据集大小: 4000276474字节
配置名称:es
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 4138004589字节, 12000个样本
- 下载大小: 4128702065字节
- 数据集大小: 4138004589字节
配置名称:fr
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 3915210199字节, 12000个样本
- 下载大小: 3906302179字节
- 数据集大小: 3915210199字节
配置名称:hu
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 4174219387字节, 12000个样本
- 下载大小: 4167484051字节
- 数据集大小: 4174219387字节
配置名称:it
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 4732854879字节, 12000个样本
- 下载大小: 4722455587字节
- 数据集大小: 4732854879字节
配置名称:nl
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 3162694343字节, 12000个样本
- 下载大小: 3154090731字节
- 数据集大小: 3162694343字节
配置名称:pl
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 4041042730字节, 12000个样本
- 下载大小: 4033450852字节
- 数据集大小: 4041042730字节
配置名称:ro
- 特征:
audio_id: 字符串audio: 音频,采样率16000language: 字符串transcription: 字符串translation: 字符串wer: 浮点数
- 分割:
train: 4341972777字节, 12000个样本
- 下载大小: 4334737748字节
- 数据集大小: 4341972777字节
数据文件
- cs:
train: cs/train-*
- de:
train: de/train-*
- en:
train: en/train-*
- es:
train: es/train-*
- fr:
train: fr/train-*
- hu:
train: hu/train-*
- it:
train: it/train-*
- nl:
train: nl/train-*
- pl:
train: pl/train-*
- ro:
train: ro/train-*
语言
- cs
- de
- en
- es
- fr
- hu
- it
- nl
- pl
- ro
标签
- speech-to-text
- speech-translation
- automatic-speech-recognition
- language-detection



