AI4Agr/CROP-benchmark
收藏Hugging Face2024-12-02 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/AI4Agr/CROP-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
---
## Introduction
Crop-benchmark is a large-scale open-source benchmark for LLMs in crop science, which includes 5045 high-quality multiple-choice questions and answers in Chinese and English.
## Basic Information
Currently, Crop-benchmark primarily includes two types of grains: rice and corn. The main topics involved in the benchmark are shown in the figure below.
<div style="text-align: center;">
<img src="./Figures/benchmark_aft_distribution.png" alt="Benchmark Framework" width="60%"/>
</div>
In the Crop-benchmark, questions have three different difficulty levels: 0, 1, and 2,
corresponding to difficult, moderate, and easy, respectively. The difficult level is made based on GPT-3.5 and GPT-4. Easy questions are those both models answered correctly,
moderate questions are those answered correctly only by GPT-4, and difficult questions are those answered incorrectly by GPT-4.
## How to Use
We have released two different versions: benchmark.xlsx and benchmark.json. Both contain the same content, so you can choose the format that suits your needs.
Please note that ``level'' corresponds to the difficulty. The codes and prompts related to this benchmark are released at https://github.com/RenqiChen/The_Crop.
## BibTeX & Citation
If you find our codes and datasets useful, please consider citing our work:
```bibtex
@inproceedings{zhangempowering,
title={Empowering and Assessing the Utility of Large Language Models in Crop Science},
author={Zhang, Hang and Sun, Jiawei and Chen, Renqi and Liu, Wei and Yuan, Zhonghang and Zheng, Xinzhe and Wang, Zhefan and Yang, Zhiyuan and Yan, Hang and Zhong, Han-Sen and others},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track}
}
```
--- 许可证:CC-BY-NC-4.0 ---
## 简介
作物基准数据集(Crop-benchmark)是面向作物科学领域大语言模型(Large Language Model, LLM)开发的大规模开源基准测试集,包含5045条中英双语高质量单项选择题及对应答案。
## 基本信息
目前,该基准测试集主要涵盖水稻与玉米两大类谷物作物,其所涉及的核心主题如下方图表所示。
<div style="text-align: center;">
<img src="./Figures/benchmark_aft_distribution.png" alt="Benchmark Framework" width="60%"/>
</div>
本基准测试集的题目共设置三级难度等级:0、1、2,分别对应困难、中等与简单三个层级。难度等级的划分基于GPT-3.5与GPT-4的作答结果:简单题为两款模型均可正确作答的题目,中等题为仅GPT-4可正确作答的题目,困难题为GPT-4无法正确作答的题目。
## 使用方法
本次公开了两种格式的数据集版本:benchmark.xlsx与benchmark.json,二者内容完全一致,您可根据自身需求选择适配的格式。请注意,字段`level`对应题目难度等级。本基准测试集相关代码与提示词已发布于:https://github.com/RenqiChen/The_Crop。
## BibTeX 与引用
若您认为本数据集与代码对您的研究有所帮助,请引用我们的工作:
bibtex
@inproceedings{zhangempowering,
title={赋能与评估大语言模型在作物科学中的应用价值},
author={Zhang, Hang and Sun, Jiawei and Chen, Renqi and Liu, Wei and Yuan, Zhonghang and Zheng, Xinzhe and Wang, Zhefan and Yang, Zhiyuan and Yan, Hang and Zhong, Han-Sen and others},
booktitle={第三十八届神经信息处理系统大会数据集与基准测试赛道}
}
提供机构:
AI4Agr



