five

qmeeus/vp-er-10l

收藏
Hugging Face2024-03-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/qmeeus/vp-er-10l
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: cs features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string splits: - name: train num_bytes: 3968868756 num_examples: 12000 download_size: 3963196917 dataset_size: 3968868756 - config_name: de features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 3498200501 num_examples: 12000 download_size: 3487997831 dataset_size: 3498200501 - config_name: en features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 4000276474 num_examples: 12000 download_size: 3984332876 dataset_size: 4000276474 - config_name: es features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 4138004589 num_examples: 12000 download_size: 4128702065 dataset_size: 4138004589 - config_name: fr features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 3915210199 num_examples: 12000 download_size: 3906302179 dataset_size: 3915210199 - config_name: hu features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 4174219387 num_examples: 12000 download_size: 4167484051 dataset_size: 4174219387 - config_name: it features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 4732854879 num_examples: 12000 download_size: 4722455587 dataset_size: 4732854879 - config_name: nl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 3162694343 num_examples: 12000 download_size: 3154090731 dataset_size: 3162694343 - config_name: pl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 4041042730 num_examples: 12000 download_size: 4033450852 dataset_size: 4041042730 - config_name: ro features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: language dtype: string - name: transcription dtype: string - name: translation dtype: string - name: wer dtype: float32 splits: - name: train num_bytes: 4341972777 num_examples: 12000 download_size: 4334737748 dataset_size: 4341972777 configs: - config_name: cs data_files: - split: train path: cs/train-* - config_name: de data_files: - split: train path: de/train-* - config_name: en data_files: - split: train path: en/train-* - config_name: es data_files: - split: train path: es/train-* - config_name: fr data_files: - split: train path: fr/train-* - config_name: hu data_files: - split: train path: hu/train-* - config_name: it data_files: - split: train path: it/train-* - config_name: nl data_files: - split: train path: nl/train-* - config_name: pl data_files: - split: train path: pl/train-* - config_name: ro data_files: - split: train path: ro/train-* language: - cs - de - en - es - fr - hu - it - nl - pl - ro tags: - speech-to-text - speech-translation - automatic-speech-recognition - language-detection --- # Dataset Card for "vp-er-10l" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

dataset_info: - config_name: cs features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 splits: - name: train 字节数: 3968868756 样本数: 12000 下载大小: 3963196917 数据集大小: 3968868756 - config_name: de features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 3498200501 样本数: 12000 下载大小: 3487997831 数据集大小: 3498200501 - config_name: en features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 4000276474 样本数: 12000 下载大小: 3984332876 数据集大小: 4000276474 - config_name: es features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 4138004589 样本数: 12000 下载大小: 4128702065 数据集大小: 4138004589 - config_name: fr features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 3915210199 样本数: 12000 下载大小: 3906302179 数据集大小: 3915210199 - config_name: hu features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 4174219387 样本数: 12000 下载大小: 4167484051 数据集大小: 4174219387 - config_name: it features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 4732854879 样本数: 12000 下载大小: 4722455587 数据集大小: 4732854879 - config_name: nl features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 3162694343 样本数: 12000 下载大小: 3154090731 数据集大小: 3162694343 - config_name: pl features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 4041042730 样本数: 12000 下载大小: 4033450852 数据集大小: 4041042730 - config_name: ro features: - name: audio_id dtype: 字符串 - name: audio dtype: 音频: 采样率: 16000 - name: language dtype: 字符串 - name: transcription dtype: 字符串 - name: translation dtype: 字符串 - name: 词错误率(Word Error Rate) dtype: float32 splits: - name: train 字节数: 4341972777 样本数: 12000 下载大小: 4334737748 数据集大小: 4341972777 configs: - config_name: cs data_files: - split: train path: cs/train-* - config_name: de data_files: - split: train path: de/train-* - config_name: en data_files: - split: train path: en/train-* - config_name: es data_files: - split: train path: es/train-* - config_name: fr data_files: - split: train path: fr/train-* - config_name: hu data_files: - split: train path: hu/train-* - config_name: it data_files: - split: train path: it/train-* - config_name: nl data_files: - split: train path: nl/train-* - config_name: pl data_files: - split: train path: pl/train-* - config_name: ro data_files: - split: train path: ro/train-* language: - cs(捷克语) - de(德语) - en(英语) - es(西班牙语) - fr(法语) - hu(匈牙利语) - it(意大利语) - nl(荷兰语) - pl(波兰语) - ro(罗马尼亚语) tags: - 语音转文本(Speech-to-Text) - 语音翻译(Speech Translation) - 自动语音识别(Automatic Speech Recognition) - 语言检测(Language Detection) # vp-er-10l数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
qmeeus
原始信息汇总

数据集概述

数据集配置

配置名称:cs

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
  • 分割:
    • train: 3968868756字节, 12000个样本
  • 下载大小: 3963196917字节
  • 数据集大小: 3968868756字节

配置名称:de

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 3498200501字节, 12000个样本
  • 下载大小: 3487997831字节
  • 数据集大小: 3498200501字节

配置名称:en

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 4000276474字节, 12000个样本
  • 下载大小: 3984332876字节
  • 数据集大小: 4000276474字节

配置名称:es

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 4138004589字节, 12000个样本
  • 下载大小: 4128702065字节
  • 数据集大小: 4138004589字节

配置名称:fr

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 3915210199字节, 12000个样本
  • 下载大小: 3906302179字节
  • 数据集大小: 3915210199字节

配置名称:hu

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 4174219387字节, 12000个样本
  • 下载大小: 4167484051字节
  • 数据集大小: 4174219387字节

配置名称:it

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 4732854879字节, 12000个样本
  • 下载大小: 4722455587字节
  • 数据集大小: 4732854879字节

配置名称:nl

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 3162694343字节, 12000个样本
  • 下载大小: 3154090731字节
  • 数据集大小: 3162694343字节

配置名称:pl

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 4041042730字节, 12000个样本
  • 下载大小: 4033450852字节
  • 数据集大小: 4041042730字节

配置名称:ro

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • language: 字符串
    • transcription: 字符串
    • translation: 字符串
    • wer: 浮点数
  • 分割:
    • train: 4341972777字节, 12000个样本
  • 下载大小: 4334737748字节
  • 数据集大小: 4341972777字节

数据文件

  • cs:
    • train: cs/train-*
  • de:
    • train: de/train-*
  • en:
    • train: en/train-*
  • es:
    • train: es/train-*
  • fr:
    • train: fr/train-*
  • hu:
    • train: hu/train-*
  • it:
    • train: it/train-*
  • nl:
    • train: nl/train-*
  • pl:
    • train: pl/train-*
  • ro:
    • train: ro/train-*

语言

  • cs
  • de
  • en
  • es
  • fr
  • hu
  • it
  • nl
  • pl
  • ro

标签

  • speech-to-text
  • speech-translation
  • automatic-speech-recognition
  • language-detection
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作