theodorr/ljspeech
收藏Hugging Face2024-05-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/theodorr/ljspeech
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 22050
- name: file
dtype: string
- name: text
dtype: string
- name: normalized_text
dtype: string
- name: align
sequence: string
- name: audio_token
sequence:
sequence:
sequence:
sequence: int64
- name: text_token
sequence: int64
- name: align_token
sequence:
sequence: string
splits:
- name: train
num_bytes: 3930310686.5
num_examples: 13100
download_size: 3818885219
dataset_size: 3930310686.5
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
The dataset includes multiple features such as id, audio file, file path, text, normalized text, alignment information, audio tokens, text tokens, and alignment tokens. The data types vary, including string, audio sampling rate, integer sequences, etc. The dataset is divided into a training set with 13100 samples. The download size is 3818885219 bytes, and the actual size is 3930310686.5 bytes. The dataset configuration is set to default, with training data files located at data/train-* path.
提供机构:
theodorr
原始信息汇总
数据集概述
数据集特征
- id: 字符串类型
- audio: 音频类型,采样率为22050
- file: 字符串类型
- text: 字符串类型
- normalized_text: 字符串类型
- align: 字符串序列类型
- audio_token: 整数序列类型,嵌套序列结构
- text_token: 整数序列类型
- align_token: 字符串序列类型,嵌套序列结构
数据集分割
- train:
- 数据量: 3930310686.5字节
- 示例数量: 13100
数据集大小
- 下载大小: 3818885219字节
- 数据集总大小: 3930310686.5字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*



