bullseye-4/Agri_STT_Benchmarking_Dataset
收藏Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bullseye-4/Agri_STT_Benchmarking_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- audio-classification
- automatic-speech-recognition
language:
- hi
- te
- or
size_categories:
- 10K<n<100K
---
This is a domain-specific, multilingual agricultural speech dataset with a primary focus on Hindi, Telugu, and Odia, designed for speech-to-text and automatic speech recognition (ASR) tasks. It features human-annotated transcriptions and is intended for benchmarking ASR model performance in real-world agricultural scenarios.
This paper presents a comprehensive benchmark of 10 ASR models for agricultural advisory use across Hindi, Telugu, and Odia, using 10,934 real-world Farmer.Chat audio recordings with human-annotated transcripts. It introduces Agriculture Weighted Word Error Rate (AWWER) and LLM-based utility scoring to better evaluate domain-critical agricultural terminology beyond traditional WER, CER, and MER metrics.
Publishing Context:
The research paper titled “Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts” is published on arXiv (arXiv:2602.03868), and the accompanying agricultural ASR benchmark dataset is publicly released on Hugging Face to support reproducible research and community-driven development.
## Reference
If you use this dataset, please cite:
Paper link:
https://arxiv.org/abs/2602.03868
https://arxiv.org/pdf/2602.03868
提供机构:
bullseye-4



