choejiin/aihub-132-preprocessed-D23-0
收藏Hugging Face2024-07-05 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/choejiin/aihub-132-preprocessed-D23-0
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含音频数据及其对应的文本转录,音频采样率为16000Hz。数据集分为训练集、测试集和验证集,分别包含151098、18888和18887个样本。每个样本包含音频数据、文本转录、输入特征和标签。输入特征为浮点数序列,标签为整数序列。数据集总大小为213581040706.0字节,下载大小为67303465385字节。
This dataset contains audio data along with corresponding text transcripts, with an audio sampling rate of 16000Hz. The dataset is divided into training, test, and validation sets, containing 151098, 18888, and 18887 samples respectively. Each sample includes audio data, text transcripts, input features, and labels. The input features are sequences of float32, and the labels are sequences of int64. The total size of the dataset is 213581040706.0 bytes, with a download size of 67303465385 bytes.
提供机构:
choejiin
原始信息汇总
数据集概述
许可证
- Apache 2.0
数据集信息
特征
- audio:
- 采样率: 16000
- transcripts:
- 数据类型: string
- input_features:
- 序列类型: float32
- labels:
- 序列类型: int64
数据分割
- train:
- 字节数: 170864380237.48862
- 样本数: 151098
- test:
- 字节数: 21358895643.394917
- 样本数: 18888
- valid:
- 字节数: 21357764825.116463
- 样本数: 18887
数据大小
- 下载大小: 67303465385
- 数据集总大小: 213581040706.0
配置
- config_name: default
- 数据文件:
- train: data/train-*
- test: data/test-*
- valid: data/valid-*
- 数据文件:



