five

bullseye-4/Agri_STT_Benchmarking_Dataset

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bullseye-4/Agri_STT_Benchmarking_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - audio-classification - automatic-speech-recognition language: - hi - te - or size_categories: - 10K<n<100K --- This is a domain-specific, multilingual agricultural speech dataset with a primary focus on Hindi, Telugu, and Odia, designed for speech-to-text and automatic speech recognition (ASR) tasks. It features human-annotated transcriptions and is intended for benchmarking ASR model performance in real-world agricultural scenarios. This paper presents a comprehensive benchmark of 10 ASR models for agricultural advisory use across Hindi, Telugu, and Odia, using 10,934 real-world Farmer.Chat audio recordings with human-annotated transcripts. It introduces Agriculture Weighted Word Error Rate (AWWER) and LLM-based utility scoring to better evaluate domain-critical agricultural terminology beyond traditional WER, CER, and MER metrics. Publishing Context: The research paper titled “Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts” is published on arXiv (arXiv:2602.03868), and the accompanying agricultural ASR benchmark dataset is publicly released on Hugging Face to support reproducible research and community-driven development. ## Reference If you use this dataset, please cite: Paper link: https://arxiv.org/abs/2602.03868 https://arxiv.org/pdf/2602.03868
提供机构:
bullseye-4
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作