distil-whisper/librispeech_asr-prompted
收藏Hugging Face2023-09-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/distil-whisper/librispeech_asr-prompted
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: all
features:
- name: file
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker_id
dtype: int64
- name: chapter_id
dtype: int64
- name: id
dtype: string
- name: whisper_transcript_unprompted
dtype: string
- name: whisper_transcript
dtype: string
splits:
- name: train.clean.100
num_bytes: 6641615051.062
num_examples: 28539
- name: train.clean.360
num_bytes: 23977966959.828
num_examples: 104014
- name: train.other.500
num_bytes: 31918849882.584
num_examples: 148688
- name: validation.clean
num_bytes: 361026354.966
num_examples: 2703
- name: validation.other
num_bytes: 338707588.648
num_examples: 2864
- name: test.clean
num_bytes: 369123744.42
num_examples: 2620
- name: test.other
num_bytes: 353861942.154
num_examples: 2939
download_size: 61926395211
dataset_size: 63961151523.662
configs:
- config_name: all
data_files:
- split: train.clean.100
path: all/train.clean.100-*
- split: train.clean.360
path: all/train.clean.360-*
- split: train.other.500
path: all/train.other.500-*
- split: validation.clean
path: all/validation.clean-*
- split: validation.other
path: all/validation.other-*
- split: test.clean
path: all/test.clean-*
- split: test.other
path: all/test.other-*
---
# Dataset Card for "librispeech_asr-prompted"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
distil-whisper
原始信息汇总
数据集概述
数据集信息
- 配置名称: all
- 特征:
- 文件名:
file(类型: string) - 音频:
audio(类型: audio, 采样率: 16000) - 文本:
text(类型: string) - 说话者ID:
speaker_id(类型: int64) - 章节ID:
chapter_id(类型: int64) - ID:
id(类型: string) - 无提示的Whisper转录:
whisper_transcript_unprompted(类型: string) - Whisper转录:
whisper_transcript(类型: string)
- 文件名:
数据分割
- 训练集:
- train.clean.100: 28539个样本, 6641615051.062字节
- train.clean.360: 104014个样本, 23977966959.828字节
- train.other.500: 148688个样本, 31918849882.584字节
- 验证集:
- validation.clean: 2703个样本, 361026354.966字节
- validation.other: 2864个样本, 338707588.648字节
- 测试集:
- test.clean: 2620个样本, 369123744.42字节
- test.other: 2939个样本, 353861942.154字节
数据集大小
- 下载大小: 61926395211字节
- 数据集大小: 63961151523.662字节
配置详情
- 配置名称: all
- 数据文件:
- train.clean.100:
all/train.clean.100-* - train.clean.360:
all/train.clean.360-* - train.other.500:
all/train.other.500-* - validation.clean:
all/validation.clean-* - validation.other:
all/validation.other-* - test.clean:
all/test.clean-* - test.other:
all/test.other-*
- train.clean.100:



