AI71ai/agrillm-train-146k

Name: AI71ai/agrillm-train-146k
Creator: AI71ai
Published: 2025-12-11 10:01:53
License: 暂无描述

Hugging Face2025-12-11 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/AI71ai/agrillm-train-146k

下载链接

链接失效反馈

官方服务：

资源简介：

`agrillm-train-146k` 是一个专注于农业知识和推理的监督训练数据集。该数据集由 ai71 与农业领域的领先组织和合作伙伴共同组装，包括 CGIAR、ECHO、Digital Green、Embrapa、FAO、世界银行、IFAD、盖茨基金会、KALRO、KIADPAI、Extension Foundation 等。数据集包含约146,000个精选样本，分为四类：人类专家生成的问答对（约15,000个样本）、从真实世界互动中提取的问答对（约5,000个样本）、通过LLMs从农业文档中控制提取生成的合成问答对（约85,000个样本）以及通过类似方法生成的领域特定任务（约40,000个样本）。所有样本经过清洗、标准化和匿名化处理，确保不包含个人或敏感数据。数据集旨在为农业AI提供高质量、领域特定的训练数据，支持农业知识的增强和推理能力的提升。

`agrillm-train-146k` is a supervised training dataset focused on agricultural knowledge and reasoning. The dataset was assembled by ai71 in partnership with leading organizations and partners across the agricultural sector such as CGIAR, ECHO, Digital Green, Embrapa, FAO, the World Bank, IFAD, the Gates Foundation, KALRO, KIADPAI, the Extension Foundation, and additional contributors. The dataset contains ~146,000 curated samples combined from human expert-generated Q&A pairs (~15,000 samples), Q&A pairs extracted from real-world interactions (~5,000 samples), synthetic Q&A pairs generated through controlled extraction from agricultural documents using LLMs (~85,000 samples), and synthetic domain-specific tasks (~40,000 samples). All partner datasets were cleaned, standardized, and fully anonymized prior to inclusion. The dataset is designed to provide a robust, reusable foundation for training or fine-tuning models in agriculture, supporting improved factual grounding, domain-specific reasoning, and comprehension.

提供机构：

AI71ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集