1rsh/gujarati-f-openslr
收藏Hugging Face2024-05-21 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/1rsh/gujarati-f-openslr
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- gu
license: apache-2.0
size_categories:
- 1K<n<10K
task_categories:
- automatic-speech-recognition
pretty_name: Gujarati OpenSLR
dataset_info:
features:
- name: audio
dtype: audio
- name: text
dtype: string
splits:
- name: train
num_bytes: 1316669603.1636739
num_examples: 1997
- name: validation
num_bytes: 148733649.60432628
num_examples: 222
download_size: 1172117279
dataset_size: 1465403252.7680001
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
---
# Gujarati OpenSLR Female
Interspeech data downloaded from https://www.openslr.org/resources/78/gu_in_female.zip
## Dataset Details
- Gujarati Data (Most of the entries are <30 seconds and hence Whisper Models can be used for accurate timestamp prediction)
- Also, the audio seems to have been spoken by a single female.
提供机构:
1rsh
原始信息汇总
数据集概述
基本信息
- 语言: Gujarati (gu)
- 许可证: Apache-2.0
- 大小分类: 1K<n<10K
- 任务分类: 自动语音识别
- 美观名称: Gujarati OpenSLR
数据集特征
- 音频: 数据类型为音频
- 文本: 数据类型为字符串
数据集分割
- 训练集:
- 示例数量: 1997
- 字节数: 1316669603.1636739
- 验证集:
- 示例数量: 222
- 字节数: 148733649.60432628
下载与数据集大小
- 下载大小: 1172117279
- 数据集大小: 1465403252.7680001
配置
- 默认配置:
- 训练数据路径: data/train-*
- 验证数据路径: data/validation-*
数据集详情
- 数据特性: 大多数条目时长小于30秒,适合使用Whisper模型进行精确时间戳预测。
- 音频特性: 音频似乎由单一女性发音。



