five

nguyenvulebinh/libris-asr-alignment

收藏
Hugging Face2024-01-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nguyenvulebinh/libris-asr-alignment
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: default features: - name: id dtype: string - name: text dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: words sequence: string - name: word_start sequence: float64 - name: word_end sequence: float64 - name: entity_start sequence: int64 - name: entity_end sequence: int64 - name: entity_label sequence: string splits: - name: train num_bytes: 62881306.53508912 num_examples: 282 - name: valid num_bytes: 7162211.0760928225 num_examples: 56 download_size: 67766544 dataset_size: 70043517.61118194 - config_name: libris features: - name: id dtype: string - name: text dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: words sequence: string - name: word_start sequence: float64 - name: word_end sequence: float64 - name: entity_start sequence: int64 - name: entity_end sequence: int64 - name: entity_label sequence: string splits: - name: train num_bytes: 62881306.53508912 num_examples: 282 - name: valid num_bytes: 7162211.0760928225 num_examples: 56 download_size: 203299632 dataset_size: 70043517.61118194 - config_name: mustc features: - name: id dtype: string - name: text dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: words sequence: string - name: word_start sequence: float64 - name: word_end sequence: float64 - name: entity_start sequence: int64 - name: entity_end sequence: int64 - name: entity_label sequence: string splits: - name: train num_bytes: 55538132.852963656 num_examples: 249 - name: valid num_bytes: 2617438.3984375 num_examples: 15 download_size: 58416692 dataset_size: 58155571.251401156 configs: - config_name: default data_files: - split: train path: data/train-* - split: valid path: data/valid-* - config_name: libris data_files: - split: train path: libris/train-* - split: valid path: libris/valid-* - config_name: mustc data_files: - split: train path: mustc/train-* - split: valid path: mustc/valid-* ---
提供机构:
nguyenvulebinh
原始信息汇总

数据集概述

数据集配置

默认配置 (default)

  • 特征:
    • id: 字符串类型
    • text: 字符串类型
    • audio: 音频类型,采样率为16000
    • words: 字符串序列
    • word_start: 浮点数序列
    • word_end: 浮点数序列
    • entity_start: 整数序列
    • entity_end: 整数序列
    • entity_label: 字符串序列
  • 分割:
    • train: 字节数为62881306.53508912,样本数为282
    • valid: 字节数为7162211.0760928225,样本数为56
  • 下载大小: 67766544字节
  • 数据集大小: 70043517.61118194字节

Libris配置 (libris)

  • 特征:
    • id: 字符串类型
    • text: 字符串类型
    • audio: 音频类型,采样率为16000
    • words: 字符串序列
    • word_start: 浮点数序列
    • word_end: 浮点数序列
    • entity_start: 整数序列
    • entity_end: 整数序列
    • entity_label: 字符串序列
  • 分割:
    • train: 字节数为62881306.53508912,样本数为282
    • valid: 字节数为7162211.0760928225,样本数为56
  • 下载大小: 203299632字节
  • 数据集大小: 70043517.61118194字节

MuST-C配置 (mustc)

  • 特征:
    • id: 字符串类型
    • text: 字符串类型
    • audio: 音频类型,采样率为16000
    • words: 字符串序列
    • word_start: 浮点数序列
    • word_end: 浮点数序列
    • entity_start: 整数序列
    • entity_end: 整数序列
    • entity_label: 字符串序列
  • 分割:
    • train: 字节数为55538132.852963656,样本数为249
    • valid: 字节数为2617438.3984375,样本数为15
  • 下载大小: 58416692字节
  • 数据集大小: 58155571.251401156字节

数据文件路径

默认配置 (default)

  • train: data/train-*
  • valid: data/valid-*

Libris配置 (libris)

  • train: libris/train-*
  • valid: libris/valid-*

MuST-C配置 (mustc)

  • train: mustc/train-*
  • valid: mustc/valid-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作