five

qmeeus/vp-er-14l

收藏
Hugging Face2024-02-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/qmeeus/vp-er-14l
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: multilang features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 9267861770.0 num_examples: 28000 download_size: 5714236793 dataset_size: 9267861770.0 - config_name: transcribe_cs features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 3967671467.0 num_examples: 12000 download_size: 3962625704 dataset_size: 3967671467.0 - config_name: transcribe_de features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 3496757286.0 num_examples: 12000 download_size: 3486791342 dataset_size: 3496757286.0 - config_name: transcribe_en features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 4000348474.0 num_examples: 12000 download_size: 3984271576 dataset_size: 4000348474.0 - config_name: transcribe_es features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 354422163.0 num_examples: 1000 download_size: 353400896 dataset_size: 354422163.0 - config_name: transcribe_fi features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 332239755.0 num_examples: 1000 download_size: 331416051 dataset_size: 332239755.0 - config_name: transcribe_fr features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 328771385.0 num_examples: 1000 download_size: 328033802 dataset_size: 328771385.0 - config_name: transcribe_hr features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 365306344.0 num_examples: 1000 download_size: 364635100 dataset_size: 365306344.0 - config_name: transcribe_hu features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 341476551.0 num_examples: 1000 download_size: 341060381 dataset_size: 341476551.0 - config_name: transcribe_it features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 387784161.0 num_examples: 1000 download_size: 386989800 dataset_size: 387784161.0 - config_name: transcribe_nl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 248989364.0 num_examples: 1000 download_size: 248410994 dataset_size: 248989364.0 - config_name: transcribe_pl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 321050950.0 num_examples: 1000 download_size: 320461314 dataset_size: 321050950.0 - config_name: transcribe_ro features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 357399352.0 num_examples: 1000 download_size: 356905704 dataset_size: 357399352.0 - config_name: transcribe_sk features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 327014013.0 num_examples: 1000 download_size: 326201959 dataset_size: 327014013.0 - config_name: transcribe_sl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 327488868.0 num_examples: 1000 download_size: 326257400 dataset_size: 327488868.0 - config_name: translate_cs features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 3966469835.0 num_examples: 12000 download_size: 3961047126 dataset_size: 3966469835.0 - config_name: translate_de features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 3495589100.0 num_examples: 12000 download_size: 3485953109 dataset_size: 3495589100.0 - config_name: translate_en features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 4000335785.0 num_examples: 12000 download_size: 3984268799 dataset_size: 4000335785.0 - config_name: translate_es features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 354059137.0 num_examples: 1000 download_size: 353045005 dataset_size: 354059137.0 - config_name: translate_fi features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 331638149.0 num_examples: 1000 download_size: 330815338 dataset_size: 331638149.0 - config_name: translate_fr features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 328745309.0 num_examples: 1000 download_size: 328018414 dataset_size: 328745309.0 - config_name: translate_hr features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 365396353.0 num_examples: 1000 download_size: 364705891 dataset_size: 365396353.0 - config_name: translate_hu features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 342234406.0 num_examples: 1000 download_size: 341838112 dataset_size: 342234406.0 - config_name: translate_it features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 387817353.0 num_examples: 1000 download_size: 387040742 dataset_size: 387817353.0 - config_name: translate_nl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 248977383.0 num_examples: 1000 download_size: 248401676 dataset_size: 248977383.0 - config_name: translate_pl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 321035993.0 num_examples: 1000 download_size: 320443680 dataset_size: 321035993.0 - config_name: translate_ro features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 357484418.0 num_examples: 1000 download_size: 356976503 dataset_size: 357484418.0 - config_name: translate_sk features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 326989680.0 num_examples: 1000 download_size: 326138608 dataset_size: 326989680.0 - config_name: translate_sl features: - name: audio_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: task dtype: string - name: language dtype: string splits: - name: train num_bytes: 327012282.0 num_examples: 1000 download_size: 325739820 dataset_size: 327012282.0 configs: - config_name: multilang data_files: - split: train path: multilang/train-* - config_name: transcribe_cs data_files: - split: train path: transcribe_cs/train-* - config_name: transcribe_de data_files: - split: train path: transcribe_de/train-* - config_name: transcribe_en data_files: - split: train path: transcribe_en/train-* - config_name: transcribe_es data_files: - split: train path: transcribe_es/train-* - config_name: transcribe_fi data_files: - split: train path: transcribe_fi/train-* - config_name: transcribe_fr data_files: - split: train path: transcribe_fr/train-* - config_name: transcribe_hr data_files: - split: train path: transcribe_hr/train-* - config_name: transcribe_hu data_files: - split: train path: transcribe_hu/train-* - config_name: transcribe_it data_files: - split: train path: transcribe_it/train-* - config_name: transcribe_nl data_files: - split: train path: transcribe_nl/train-* - config_name: transcribe_pl data_files: - split: train path: transcribe_pl/train-* - config_name: transcribe_ro data_files: - split: train path: transcribe_ro/train-* - config_name: transcribe_sk data_files: - split: train path: transcribe_sk/train-* - config_name: transcribe_sl data_files: - split: train path: transcribe_sl/train-* - config_name: translate_cs data_files: - split: train path: translate_cs/train-* - config_name: translate_de data_files: - split: train path: translate_de/train-* - config_name: translate_en data_files: - split: train path: translate_en/train-* - config_name: translate_es data_files: - split: train path: translate_es/train-* - config_name: translate_fi data_files: - split: train path: translate_fi/train-* - config_name: translate_fr data_files: - split: train path: translate_fr/train-* - config_name: translate_hr data_files: - split: train path: translate_hr/train-* - config_name: translate_hu data_files: - split: train path: translate_hu/train-* - config_name: translate_it data_files: - split: train path: translate_it/train-* - config_name: translate_nl data_files: - split: train path: translate_nl/train-* - config_name: translate_pl data_files: - split: train path: translate_pl/train-* - config_name: translate_ro data_files: - split: train path: translate_ro/train-* - config_name: translate_sk data_files: - split: train path: translate_sk/train-* - config_name: translate_sl data_files: - split: train path: translate_sl/train-* --- # Dataset Card for "vp-er-14l" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
qmeeus
原始信息汇总

数据集概述

数据集配置

多语言配置 (multilang)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 28000个样本,9267861770.0字节
  • 下载大小: 5714236793字节
  • 数据集大小: 9267861770.0字节

捷克语转写配置 (transcribe_cs)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 12000个样本,3967671467.0字节
  • 下载大小: 3962625704字节
  • 数据集大小: 3967671467.0字节

德语转写配置 (transcribe_de)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 12000个样本,3496757286.0字节
  • 下载大小: 3486791342字节
  • 数据集大小: 3496757286.0字节

英语转写配置 (transcribe_en)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 12000个样本,4000348474.0字节
  • 下载大小: 3984271576字节
  • 数据集大小: 4000348474.0字节

西班牙语转写配置 (transcribe_es)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,354422163.0字节
  • 下载大小: 353400896字节
  • 数据集大小: 354422163.0字节

芬兰语转写配置 (transcribe_fi)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,332239755.0字节
  • 下载大小: 331416051字节
  • 数据集大小: 332239755.0字节

法语转写配置 (transcribe_fr)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,328771385.0字节
  • 下载大小: 328033802字节
  • 数据集大小: 328771385.0字节

克罗地亚语转写配置 (transcribe_hr)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,365306344.0字节
  • 下载大小: 364635100字节
  • 数据集大小: 365306344.0字节

匈牙利语转写配置 (transcribe_hu)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,341476551.0字节
  • 下载大小: 341060381字节
  • 数据集大小: 341476551.0字节

意大利语转写配置 (transcribe_it)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,387784161.0字节
  • 下载大小: 386989800字节
  • 数据集大小: 387784161.0字节

荷兰语转写配置 (transcribe_nl)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,248989364.0字节
  • 下载大小: 248410994字节
  • 数据集大小: 248989364.0字节

波兰语转写配置 (transcribe_pl)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,321050950.0字节
  • 下载大小: 320461314字节
  • 数据集大小: 321050950.0字节

罗马尼亚语转写配置 (transcribe_ro)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,357399352.0字节
  • 下载大小: 356905704字节
  • 数据集大小: 357399352.0字节

斯洛伐克语转写配置 (transcribe_sk)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,327014013.0字节
  • 下载大小: 326201959字节
  • 数据集大小: 327014013.0字节

斯洛文尼亚语转写配置 (transcribe_sl)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,327488868.0字节
  • 下载大小: 326257400字节
  • 数据集大小: 327488868.0字节

捷克语翻译配置 (translate_cs)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 12000个样本,3966469835.0字节
  • 下载大小: 3961047126字节
  • 数据集大小: 3966469835.0字节

德语翻译配置 (translate_de)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 12000个样本,3495589100.0字节
  • 下载大小: 3485953109字节
  • 数据集大小: 3495589100.0字节

英语翻译配置 (translate_en)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 12000个样本,4000335785.0字节
  • 下载大小: 3984268799字节
  • 数据集大小: 4000335785.0字节

西班牙语翻译配置 (translate_es)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,354059137.0字节
  • 下载大小: 353045005字节
  • 数据集大小: 354059137.0字节

芬兰语翻译配置 (translate_fi)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,331638149.0字节
  • 下载大小: 330815338字节
  • 数据集大小: 331638149.0字节

法语翻译配置 (translate_fr)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,328745309.0字节
  • 下载大小: 328018414字节
  • 数据集大小: 328745309.0字节

克罗地亚语翻译配置 (translate_hr)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,365396353.0字节
  • 下载大小: 364705891字节
  • 数据集大小: 365396353.0字节

匈牙利语翻译配置 (translate_hu)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,342234406.0字节
  • 下载大小: 341838112字节
  • 数据集大小: 342234406.0字节

意大利语翻译配置 (translate_it)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,387817353.0字节
  • 下载大小: 387040742字节
  • 数据集大小: 387817353.0字节

荷兰语翻译配置 (translate_nl)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,248977383.0字节
  • 下载大小: 248401676字节
  • 数据集大小: 248977383.0字节

波兰语翻译配置 (translate_pl)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,321035993.0字节
  • 下载大小: 320443680字节
  • 数据集大小: 321035993.0字节

罗马尼亚语翻译配置 (translate_ro)

  • 特征:
    • audio_id: 字符串
    • audio: 音频,采样率16000
    • text: 字符串
    • task: 字符串
    • language: 字符串
  • 分割:
    • train: 1000个样本,357484418.0字节
  • 下载大小: 356976503字节
  • 数据集大小: 357484418.0字节

斯洛伐克

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作