five

AI4Agr/CROP-benchmark

收藏
Hugging Face2024-12-02 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/AI4Agr/CROP-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 --- ## Introduction Crop-benchmark is a large-scale open-source benchmark for LLMs in crop science, which includes 5045 high-quality multiple-choice questions and answers in Chinese and English. ## Basic Information Currently, Crop-benchmark primarily includes two types of grains: rice and corn. The main topics involved in the benchmark are shown in the figure below. <div style="text-align: center;"> <img src="./Figures/benchmark_aft_distribution.png" alt="Benchmark Framework" width="60%"/> </div> In the Crop-benchmark, questions have three different difficulty levels: 0, 1, and 2, corresponding to difficult, moderate, and easy, respectively. The difficult level is made based on GPT-3.5 and GPT-4. Easy questions are those both models answered correctly, moderate questions are those answered correctly only by GPT-4, and difficult questions are those answered incorrectly by GPT-4. ## How to Use We have released two different versions: benchmark.xlsx and benchmark.json. Both contain the same content, so you can choose the format that suits your needs. Please note that ``level'' corresponds to the difficulty. The codes and prompts related to this benchmark are released at https://github.com/RenqiChen/The_Crop. ## BibTeX & Citation If you find our codes and datasets useful, please consider citing our work: ```bibtex @inproceedings{zhangempowering, title={Empowering and Assessing the Utility of Large Language Models in Crop Science}, author={Zhang, Hang and Sun, Jiawei and Chen, Renqi and Liu, Wei and Yuan, Zhonghang and Zheng, Xinzhe and Wang, Zhefan and Yang, Zhiyuan and Yan, Hang and Zhong, Han-Sen and others}, booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track} } ```

--- 许可证:CC-BY-NC-4.0 --- ## 简介 作物基准数据集(Crop-benchmark)是面向作物科学领域大语言模型(Large Language Model, LLM)开发的大规模开源基准测试集,包含5045条中英双语高质量单项选择题及对应答案。 ## 基本信息 目前,该基准测试集主要涵盖水稻与玉米两大类谷物作物,其所涉及的核心主题如下方图表所示。 <div style="text-align: center;"> <img src="./Figures/benchmark_aft_distribution.png" alt="Benchmark Framework" width="60%"/> </div> 本基准测试集的题目共设置三级难度等级:0、1、2,分别对应困难、中等与简单三个层级。难度等级的划分基于GPT-3.5与GPT-4的作答结果:简单题为两款模型均可正确作答的题目,中等题为仅GPT-4可正确作答的题目,困难题为GPT-4无法正确作答的题目。 ## 使用方法 本次公开了两种格式的数据集版本:benchmark.xlsx与benchmark.json,二者内容完全一致,您可根据自身需求选择适配的格式。请注意,字段`level`对应题目难度等级。本基准测试集相关代码与提示词已发布于:https://github.com/RenqiChen/The_Crop。 ## BibTeX 与引用 若您认为本数据集与代码对您的研究有所帮助,请引用我们的工作: bibtex @inproceedings{zhangempowering, title={赋能与评估大语言模型在作物科学中的应用价值}, author={Zhang, Hang and Sun, Jiawei and Chen, Renqi and Liu, Wei and Yuan, Zhonghang and Zheng, Xinzhe and Wang, Zhefan and Yang, Zhiyuan and Yan, Hang and Zhong, Han-Sen and others}, booktitle={第三十八届神经信息处理系统大会数据集与基准测试赛道} }
提供机构:
AI4Agr
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作