distil-whisper/tedlium-prompted
收藏Hugging Face2023-09-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/distil-whisper/tedlium-prompted
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: release3
features:
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker_id
dtype: string
- name: gender
dtype:
class_label:
names:
'0': unknown
'1': female
'2': male
- name: file
dtype: string
- name: id
dtype: string
- name: whisper_transcript_unprompted
dtype: string
- name: whisper_transcript
dtype: string
splits:
- name: train
num_bytes: 52484152554.125
num_examples: 268263
- name: validation
num_bytes: 184679438.0
num_examples: 507
- name: test
num_bytes: 302513272.625
num_examples: 1155
download_size: 52650349441
dataset_size: 52971345264.75
configs:
- config_name: release3
data_files:
- split: train
path: release3/train-*
- split: validation
path: release3/validation-*
- split: test
path: release3/test-*
---
# Dataset Card for "tedlium-prompted"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
distil-whisper
原始信息汇总
数据集概述
配置信息
- 配置名称: release3
特征信息
- 音频:
- 采样率: 16000
- 文本:
- 数据类型: 字符串
- 说话者ID:
- 数据类型: 字符串
- 性别:
- 数据类型: 类别标签
- 类别名称:
- 0: unknown
- 1: female
- 2: male
- 文件:
- 数据类型: 字符串
- ID:
- 数据类型: 字符串
- 无提示的Whisper转录:
- 数据类型: 字符串
- Whisper转录:
- 数据类型: 字符串
数据分割
- 训练集:
- 字节数: 52484152554.125
- 样本数: 268263
- 验证集:
- 字节数: 184679438.0
- 样本数: 507
- 测试集:
- 字节数: 302513272.625
- 样本数: 1155
数据大小
- 下载大小: 52650349441
- 数据集大小: 52971345264.75
数据文件路径
- 配置名称: release3
- 训练集: release3/train-*
- 验证集: release3/validation-*
- 测试集: release3/test-*



