nguyenvulebinh/libris-asr-alignment
收藏Hugging Face2024-01-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nguyenvulebinh/libris-asr-alignment
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: default
features:
- name: id
dtype: string
- name: text
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: words
sequence: string
- name: word_start
sequence: float64
- name: word_end
sequence: float64
- name: entity_start
sequence: int64
- name: entity_end
sequence: int64
- name: entity_label
sequence: string
splits:
- name: train
num_bytes: 62881306.53508912
num_examples: 282
- name: valid
num_bytes: 7162211.0760928225
num_examples: 56
download_size: 67766544
dataset_size: 70043517.61118194
- config_name: libris
features:
- name: id
dtype: string
- name: text
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: words
sequence: string
- name: word_start
sequence: float64
- name: word_end
sequence: float64
- name: entity_start
sequence: int64
- name: entity_end
sequence: int64
- name: entity_label
sequence: string
splits:
- name: train
num_bytes: 62881306.53508912
num_examples: 282
- name: valid
num_bytes: 7162211.0760928225
num_examples: 56
download_size: 203299632
dataset_size: 70043517.61118194
- config_name: mustc
features:
- name: id
dtype: string
- name: text
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: words
sequence: string
- name: word_start
sequence: float64
- name: word_end
sequence: float64
- name: entity_start
sequence: int64
- name: entity_end
sequence: int64
- name: entity_label
sequence: string
splits:
- name: train
num_bytes: 55538132.852963656
num_examples: 249
- name: valid
num_bytes: 2617438.3984375
num_examples: 15
download_size: 58416692
dataset_size: 58155571.251401156
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: valid
path: data/valid-*
- config_name: libris
data_files:
- split: train
path: libris/train-*
- split: valid
path: libris/valid-*
- config_name: mustc
data_files:
- split: train
path: mustc/train-*
- split: valid
path: mustc/valid-*
---
提供机构:
nguyenvulebinh
原始信息汇总
数据集概述
数据集配置
默认配置 (default)
- 特征:
id: 字符串类型text: 字符串类型audio: 音频类型,采样率为16000words: 字符串序列word_start: 浮点数序列word_end: 浮点数序列entity_start: 整数序列entity_end: 整数序列entity_label: 字符串序列
- 分割:
train: 字节数为62881306.53508912,样本数为282valid: 字节数为7162211.0760928225,样本数为56
- 下载大小: 67766544字节
- 数据集大小: 70043517.61118194字节
Libris配置 (libris)
- 特征:
id: 字符串类型text: 字符串类型audio: 音频类型,采样率为16000words: 字符串序列word_start: 浮点数序列word_end: 浮点数序列entity_start: 整数序列entity_end: 整数序列entity_label: 字符串序列
- 分割:
train: 字节数为62881306.53508912,样本数为282valid: 字节数为7162211.0760928225,样本数为56
- 下载大小: 203299632字节
- 数据集大小: 70043517.61118194字节
MuST-C配置 (mustc)
- 特征:
id: 字符串类型text: 字符串类型audio: 音频类型,采样率为16000words: 字符串序列word_start: 浮点数序列word_end: 浮点数序列entity_start: 整数序列entity_end: 整数序列entity_label: 字符串序列
- 分割:
train: 字节数为55538132.852963656,样本数为249valid: 字节数为2617438.3984375,样本数为15
- 下载大小: 58416692字节
- 数据集大小: 58155571.251401156字节
数据文件路径
默认配置 (default)
train:data/train-*valid:data/valid-*
Libris配置 (libris)
train:libris/train-*valid:libris/valid-*
MuST-C配置 (mustc)
train:mustc/train-*valid:mustc/valid-*



