AI71ai/agrillm-qa-eval-800
收藏Hugging Face2025-12-11 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/AI71ai/agrillm-qa-eval-800
下载链接
链接失效反馈官方服务:
资源简介:
agrillm-qa-eval-800是一个高质量评估数据集,专注于农业知识和推理。该数据集由ai71与CGIAR、ECHO、Digital Green等农业领域的领先组织合作创建,旨在作为评估农业领域大型语言模型(LLMs)和农业相关RAG应用及AI代理的开放基准。数据集包含800个精选样本,分为人类专家生成的Q&A对和通过LLMs从农业文档中提取的合成Q&A对。数据集结构包含三列:系统提示、用户提示和参考答案。主要用途是评估LLMs在农业领域的事实准确性、推理能力以及多步骤问题解决技能。数据集存在一定局限性,如不反映真实农民行为或田间条件,以及在某些主题领域的代表性不足。
agrillm-qa-eval-800 is a high-quality evaluation dataset focused on agricultural knowledge and reasoning. The dataset was assembled by ai71 in partnership with leading organizations across the agricultural sector such as CGIAR, ECHO, and Digital Green. It is intended as an open benchmark for evaluating Agricultural Domain LLMs, and agriculture-focused RAG applications and AI agents. The dataset contains 800 curated samples, including human expert-generated Q&A pairs and synthetic Q&A pairs extracted from agricultural documents using LLMs. The dataset structure includes three columns: System Prompt, User Prompt, and Reference Answer. Its primary use is to assess LLMs factual accuracy, reasoning skills, and multi-step problem-solving abilities in agriculture. The dataset has limitations such as not reflecting real farmer behavior or field conditions and potential under-representation in certain subject areas.
提供机构:
AI71ai



