theodorr/librispeech_asr_encodec

Name: theodorr/librispeech_asr_encodec
Creator: theodorr
Published: 2024-07-07 21:11:58
License: 暂无描述

Hugging Face2024-07-07 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/theodorr/librispeech_asr_encodec

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个字段，包括文件路径、音频数据、文本、说话者ID、章节ID和唯一ID。音频数据的采样率为24000Hz。数据集被分为多个分割，如train.clean.100、train.clean.360、train.other.500等，每个分割都有相应的字节大小和样本数量。数据集的下载大小为61838030296字节，总大小为63779756983.662字节。

The dataset contains multiple fields including file path, audio data, text, speaker ID, chapter ID, and unique ID. The audio data has a sampling rate of 24000 Hz. The dataset is divided into several splits such as train.clean.100, train.clean.360, train.other.500, etc., each with corresponding byte sizes and number of examples. The download size of the dataset is 61838030296 bytes, and the total size is 63779756983.662 bytes.

提供机构：

theodorr

原始信息汇总

数据集概述

数据特征

file: 文件名，数据类型为字符串。
audio: 音频数据，采样率为24000。
text: 文本数据，数据类型为字符串。
speaker_id: 说话者ID，数据类型为整数。
chapter_id: 章节ID，数据类型为整数。
id: 唯一标识符，数据类型为字符串。

数据集划分

train.clean.100: 包含28539个样本，总大小为6623055766.062字节。
train.clean.360: 包含104014个样本，总大小为23910553121.828字节。
train.other.500: 包含148688个样本，总大小为31827871203.584字节。
validation.clean: 包含2703个样本，总大小为359892375.966字节。
validation.other: 包含2864个样本，总大小为337622897.648字节。
test.clean: 包含2620个样本，总大小为368016566.42字节。
test.other: 包含2939个样本，总大小为352745052.154字节。

数据集大小

下载大小: 61838030296字节。
数据集总大小: 63779756983.662字节。

配置

config_name: default
- data_files:
  - train.clean.100: 路径为data/train.clean.100-*。
  - train.clean.360: 路径为data/train.clean.360-*。
  - train.other.500: 路径为data/train.other.500-*。
  - validation.clean: 路径为data/validation.clean-*。
  - validation.other: 路径为data/validation.other-*。
  - test.clean: 路径为data/test.clean-*。
  - test.other: 路径为data/test.other-*。

5,000+

优质数据集

54 个

任务类型

进入经典数据集