EVOEVAL
收藏arXiv2024-03-28 更新2024-06-21 收录
下载链接:
https://github.com/evo-eval/evoeval
下载链接
链接失效反馈官方服务:
资源简介:
EVOEVAL数据集是由伊利诺伊大学厄巴纳-香槟分校的研究人员创建的,旨在通过演化现有的编程基准问题,为评估大型语言模型(LLMs)的编程能力提供一个全面的基准套件。该数据集包含828个问题,分布在7个不同的基准中,包括DIFFICULT、CREATIVE、SUBTLE、COMBINE、TOOL USE、VERBOSE和CONCISE。这些基准通过不同的指令演化或转换现有问题,以测试LLMs在不同编程场景下的表现。EVOEVAL不仅提供了全面的基准,还可以用于进一步演化任意问题,以跟上LLMs和编程领域的不断变化。
The EVOEVAL dataset was developed by researchers at the University of Illinois Urbana-Champaign to create a comprehensive benchmark suite for evaluating the programming capabilities of Large Language Models (LLMs) through the evolution of existing programming benchmark problems. The dataset contains 828 problems spanning seven distinct benchmarks: DIFFICULT, CREATIVE, SUBTLE, COMBINE, TOOL USE, VERBOSE, and CONCISE. These benchmarks evolve or transform existing programming problems via varied instructions to test LLMs' performance across diverse programming scenarios. Beyond providing a comprehensive evaluation benchmark, EVOEVAL can also be utilized to further evolve arbitrary problems to keep pace with the ongoing developments in both LLMs and the programming domain.
提供机构:
伊利诺伊大学厄巴纳-香槟分校
创建时间:
2024-03-28



