theodorr/librittsr_encodec

Name: theodorr/librittsr_encodec
Creator: theodorr
Published: 2024-07-19 18:01:28
License: 暂无描述

Hugging Face2024-07-19 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/theodorr/librittsr_encodec

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含音频和文本数据，主要特征包括音频标记（audio_token）、文本（text）、说话者（speaker）、书籍（book）和音频时长（audio_duration）。数据集分为三个部分：train_clean_360、train_other_500和train_clean_100，分别包含116451、205035和33232个样本，总大小为9609238315字节。

This dataset contains audio and text data, with key features including audio tokens (audio_token), text (text), speaker (speaker), book (book), and audio duration (audio_duration). The dataset is divided into three splits: train_clean_360, train_other_500, and train_clean_100, containing 116451, 205035, and 33232 samples respectively, with a total size of 9609238315 bytes.

提供机构：

theodorr

原始信息汇总

数据集概述

数据集特征

audio_token: 嵌套序列，内部为int64类型。
text: 字符串类型。
speaker: 字符串类型。
book: 字符串类型。
audio_duration: 浮点数类型。

数据集分割

train_clean_360:
- 字节数: 3310792673
- 样本数: 116451
train_other_500:
- 字节数: 5367393098
- 样本数: 205035
train_clean_100:
- 字节数: 931052544
- 样本数: 33232

数据集大小

下载大小: 1522957345 字节
总大小: 9609238315 字节

配置

config_name: default
- 数据文件路径:
  - train_clean_360: data/train_clean_360-*
  - train_other_500: data/train_other_500-*
  - train_clean_100: data/train_clean_100-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集