BoburAmirov/asr_evaluate_set

Name: BoburAmirov/asr_evaluate_set
Creator: BoburAmirov
Published: 2025-12-11 11:44:58
License: 暂无描述

Hugging Face2025-12-11 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/BoburAmirov/asr_evaluate_set

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集专为评估乌兹别克语语音转文本（STT）模型在真实世界对话语音数据上的表现而设计。音频样本采集自多个开放的Telegram群组，涵盖了多种声学条件和说话风格的自然语音消息。数据集包含745个音频文件，总时长为1小时40分钟（约100分钟），每个样本平均时长为8秒。数据来源为开放Telegram群组的语音消息，转录文本为手动标注。数据集以Arrow格式保存，包含音频文件名、音频数据（含数组和采样率）和转录文本三个字段。音频属性包括来自Telegram的对话语音消息、多说话者和多样声学环境、真实世界录音条件以及乌兹别克语语言。转录细节包括手动标注方法、人工验证的真实标签以及去除标点和小写化的标注规范。数据集适用于评估语音转文本模型在对话语音上的表现、基准测试ASR系统在真实语音消息上的性能、测试模型对多样声学条件的鲁棒性以及比较不同STT模型。

This dataset is designed for evaluating Uzbek speech-to-text (STT) models on real-world conversational speech data. The audio samples were collected from various open Telegram groups, capturing natural voice messages in diverse acoustic conditions and speaking styles. The dataset contains 745 audio files with a total duration of 1 hour 40 minutes (~100 minutes) and an average duration of ~8 seconds per sample. The source of the data is voice messages from various open Telegram groups, and the transcriptions are manually annotated. The dataset is saved in Arrow format as a `datasets.Dataset` object, containing fields such as audio file name, audio data (including array and sampling rate), and transcription text. Audio properties include conversational voice messages from Telegram, multiple speakers, diverse acoustic environments, real-world recording conditions, and the Uzbek language. Transcription details include manual annotation methods, human-verified ground truth labels, and annotation conventions with punctuation removed and lowercased. The dataset is suitable for evaluating speech-to-text model performance on conversational speech, benchmarking ASR systems on real-world voice messages, testing model robustness to varied acoustic conditions, and comparing different STT models.

提供机构：

BoburAmirov

5,000+

优质数据集

54 个

任务类型

进入经典数据集