ghananlpcommunity/twi-stitched-words-asr
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/twi-stitched-words-asr
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: words
dtype: string
- name: word_count
dtype: int32
- name: word_indices
dtype: string
- name: word_boundaries
dtype: string
- name: total_duration_ms
dtype: float32
- name: filename
dtype: string
splits:
- name: train
num_bytes: 5759902201.75
num_examples: 51682
download_size: 5307931521
dataset_size: 5759902201.75
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
数据集元信息:
特征字段:
- 字段名:音频(audio),数据类型:音频格式,采样率为16000
- 字段名:文本(text),数据类型:字符串(string)
- 字段名:单词(words),数据类型:字符串
- 字段名:单词数(word_count),数据类型:32位整型(int32)
- 字段名:单词索引(word_indices),数据类型:字符串
- 字段名:单词边界(word_boundaries),数据类型:字符串
- 字段名:总时长(毫秒)(total_duration_ms),数据类型:32位浮点型(float32)
- 字段名:文件名(filename),数据类型:字符串
数据拆分:
- 拆分名称:训练集(train),字节数:5759902201.75,样本数量:51682
下载大小:5307931521,数据集总大小:5759902201.75
数据集配置:
- 配置名称:默认配置(default),数据文件配置:
- 对应拆分:训练集(train),文件路径:data/train-*
提供机构:
ghananlpcommunity



