bullseye-4/Agri_STT_Benchmarking_Dataset

Name: bullseye-4/Agri_STT_Benchmarking_Dataset
Creator: bullseye-4
Published: 2026-03-20 12:50:45
License: 暂无描述

Hugging Face2026-03-20 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/bullseye-4/Agri_STT_Benchmarking_Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - audio-classification - automatic-speech-recognition language: - hi - te - or size_categories: - 10K<n<100K --- This is a domain-specific, multilingual agricultural speech dataset with a primary focus on Hindi, Telugu, and Odia, designed for speech-to-text and automatic speech recognition (ASR) tasks. It features human-annotated transcriptions and is intended for benchmarking ASR model performance in real-world agricultural scenarios. This paper presents a comprehensive benchmark of 10 ASR models for agricultural advisory use across Hindi, Telugu, and Odia, using 10,934 real-world Farmer.Chat audio recordings with human-annotated transcripts. It introduces Agriculture Weighted Word Error Rate (AWWER) and LLM-based utility scoring to better evaluate domain-critical agricultural terminology beyond traditional WER, CER, and MER metrics. Publishing Context: The research paper titled “Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts” is published on arXiv (arXiv:2602.03868), and the accompanying agricultural ASR benchmark dataset is publicly released on Hugging Face to support reproducible research and community-driven development. ## Reference If you use this dataset, please cite: Paper link: https://arxiv.org/abs/2602.03868 https://arxiv.org/pdf/2602.03868

提供机构：

bullseye-4

5,000+

优质数据集

54 个

任务类型

进入经典数据集