mekaneeky/Processed-Luganda-SpeechT5-with-SALT-translation-11-7-23
收藏Hugging Face2023-07-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mekaneeky/Processed-Luganda-SpeechT5-with-SALT-translation-11-7-23
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: audio
sequence:
sequence: float32
- name: input_ids
sequence: int32
- name: attention_mask
sequence: int8
- name: encoder_input_values
sequence:
sequence: float32
- name: encoder_attention_mask
sequence:
sequence: int32
- name: acholi_transcription
dtype: string
- name: lugbara_transcription
dtype: string
- name: english_transcription
dtype: string
- name: runyankole_transcription
dtype: string
- name: ateso_transcription
dtype: string
splits:
- name: train
num_bytes: 43512528901
num_examples: 32352
- name: validation
num_bytes: 547401321
num_examples: 407
download_size: 9842097693
dataset_size: 44059930222
---
# Dataset Card for "Processed-Luganda-SpeechT5-with-SALT-translation-11-7-23"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
mekaneeky
原始信息汇总
数据集概述
数据集名称
- 名称: Processed-Luganda-SpeechT5-with-SALT-translation-11-7-23
数据集特征
-
音频特征:
- 名称: audio
- 数据类型: float32
-
输入ID特征:
- 名称: input_ids
- 数据类型: int32
-
注意力掩码特征:
- 名称: attention_mask
- 数据类型: int8
-
编码器输入值特征:
- 名称: encoder_input_values
- 数据类型: float32
-
编码器注意力掩码特征:
- 名称: encoder_attention_mask
- 数据类型: int32
-
转录文本特征:
- 名称: acholi_transcription
- 数据类型: string
- 名称: lugbara_transcription
- 数据类型: string
- 名称: english_transcription
- 数据类型: string
- 名称: runyankole_transcription
- 数据类型: string
- 名称: ateso_transcription
- 数据类型: string
数据集分割
-
训练集:
- 样本数量: 32352
- 数据大小: 43512528901 bytes
-
验证集:
- 样本数量: 407
- 数据大小: 547401321 bytes
数据集大小
- 下载大小: 9842097693 bytes
- 总数据集大小: 44059930222 bytes



