arXivBench
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/arXivBenchLLM/arXivBench
下载链接
链接失效反馈官方服务:
资源简介:
该数据集旨在评估大型语言模型在八个主要学科类别以及计算机科学五个子领域中的表现,包含了一系列要求提供相关研究论文的提示。此外,该数据集还包括了一个每月更新的Kaggle镜像版arXiv数据集,以便于交叉参考论文信息。规模上,它涵盖了八个主要学科类别的4,000个提示以及计算机科学五个子领域的2,500个提示。任务目标是评估大型语言模型基于提示生成相关研究论文及准确arXiv链接的能力。
This dataset is designed to evaluate the performance of large language models (LLMs) across eight major subject categories and five subfields of computer science. It includes a series of prompts that request relevant research papers. Additionally, the dataset provides a monthly-updated Kaggle mirrored arXiv dataset to facilitate cross-referencing of paper information. In terms of scale, it covers 4,000 prompts across the eight major subject categories and 2,500 prompts across the five subfields of computer science. The task objective is to evaluate the ability of large language models to generate relevant research papers and accurate arXiv links based on the given prompts.
提供机构:
arXivBenchLLM



