CGIAR/gardian-ai-ready-docs
收藏Hugging Face2025-02-27 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/CGIAR/gardian-ai-ready-docs
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含45,232篇来自CGIAR的农业研究出版物的综合研究语料库,特别为大型语言模型在农业咨询服务中的应用而处理和结构化。该语料库弥合了先进农业研究与会场级咨询需求之间的差距,利用CGIAR的广泛科学知识库,该知识库已被公共和私人延伸服务使用。每篇文档都使用GROBID工具系统化处理,以提取结构化内容,同时保留关键的科学背景、元数据和特定领域的农业知识。该语料库涵盖了各种农业主题,特别强调低收入和中等收入国家的小规模生产商环境。这个机器可读的数据集专门策划,以通过检索增强生成框架来提高AI生成农业建议的准确性和情境相关性。
This is a comprehensive research corpus of 45,232 agricultural research publications from CGIAR, specifically processed and structured for Large Language Model (LLM) applications in agricultural advisory services. The corpus bridges the gap between advanced agricultural research and field-level advisory needs, utilizing CGIARs extensive scientific knowledge base that has been used by both public and private extension services. Each document has been systematically processed using GROBID to extract structured content while preserving critical scientific context, metadata, and domain-specific agricultural knowledge. The corpus covers diverse agricultural topics, with particular emphasis on small-scale producer contexts in low and middle-income countries. This machine-readable dataset is curated to enhance the accuracy and contextual relevance of AI-generated agricultural advisories through Retrieval-Augmented Generation (RAG) frameworks.
提供机构:
CGIAR



