AI4Agr/CROP-benchmark

Name: AI4Agr/CROP-benchmark
Creator: AI4Agr
Published: 2024-12-02 07:08:44
License: 暂无描述

Hugging Face2024-12-02 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/AI4Agr/CROP-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 --- ## Introduction Crop-benchmark is a large-scale open-source benchmark for LLMs in crop science, which includes 5045 high-quality multiple-choice questions and answers in Chinese and English. ## Basic Information Currently, Crop-benchmark primarily includes two types of grains: rice and corn. The main topics involved in the benchmark are shown in the figure below. <div style="text-align: center;"> <img src="./Figures/benchmark_aft_distribution.png" alt="Benchmark Framework" width="60%"/> </div> In the Crop-benchmark, questions have three different difficulty levels: 0, 1, and 2, corresponding to difficult, moderate, and easy, respectively. The difficult level is made based on GPT-3.5 and GPT-4. Easy questions are those both models answered correctly, moderate questions are those answered correctly only by GPT-4, and difficult questions are those answered incorrectly by GPT-4. ## How to Use We have released two different versions: benchmark.xlsx and benchmark.json. Both contain the same content, so you can choose the format that suits your needs. Please note that ``level'' corresponds to the difficulty. The codes and prompts related to this benchmark are released at https://github.com/RenqiChen/The_Crop. ## BibTeX & Citation If you find our codes and datasets useful, please consider citing our work: ```bibtex @inproceedings{zhangempowering, title={Empowering and Assessing the Utility of Large Language Models in Crop Science}, author={Zhang, Hang and Sun, Jiawei and Chen, Renqi and Liu, Wei and Yuan, Zhonghang and Zheng, Xinzhe and Wang, Zhefan and Yang, Zhiyuan and Yan, Hang and Zhong, Han-Sen and others}, booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track} } ```

--- 许可证：CC-BY-NC-4.0 --- ## 简介作物基准数据集（Crop-benchmark）是面向作物科学领域大语言模型（Large Language Model, LLM）开发的大规模开源基准测试集，包含5045条中英双语高质量单项选择题及对应答案。 ## 基本信息目前，该基准测试集主要涵盖水稻与玉米两大类谷物作物，其所涉及的核心主题如下方图表所示。 <div style="text-align: center;"> <img src="./Figures/benchmark_aft_distribution.png" alt="Benchmark Framework" width="60%"/> </div> 本基准测试集的题目共设置三级难度等级：0、1、2，分别对应困难、中等与简单三个层级。难度等级的划分基于GPT-3.5与GPT-4的作答结果：简单题为两款模型均可正确作答的题目，中等题为仅GPT-4可正确作答的题目，困难题为GPT-4无法正确作答的题目。 ## 使用方法本次公开了两种格式的数据集版本：benchmark.xlsx与benchmark.json，二者内容完全一致，您可根据自身需求选择适配的格式。请注意，字段`level`对应题目难度等级。本基准测试集相关代码与提示词已发布于：https://github.com/RenqiChen/The_Crop。 ## BibTeX 与引用若您认为本数据集与代码对您的研究有所帮助，请引用我们的工作： bibtex @inproceedings{zhangempowering, title={赋能与评估大语言模型在作物科学中的应用价值}, author={Zhang, Hang and Sun, Jiawei and Chen, Renqi and Liu, Wei and Yuan, Zhonghang and Zheng, Xinzhe and Wang, Zhefan and Yang, Zhiyuan and Yan, Hang and Zhong, Han-Sen and others}, booktitle={第三十八届神经信息处理系统大会数据集与基准测试赛道} }

提供机构：

AI4Agr

5,000+

优质数据集

54 个

任务类型

进入经典数据集