EVOEVAL

Name: EVOEVAL
Creator: 伊利诺伊大学厄巴纳-香槟分校
Published: 2024-03-28 11:10:39
License: 暂无描述

arXiv2024-03-28 更新2024-06-21 收录

下载链接：

https://github.com/evo-eval/evoeval

下载链接

链接失效反馈

官方服务：

资源简介：

EVOEVAL数据集是由伊利诺伊大学厄巴纳-香槟分校的研究人员创建的，旨在通过演化现有的编程基准问题，为评估大型语言模型（LLMs）的编程能力提供一个全面的基准套件。该数据集包含828个问题，分布在7个不同的基准中，包括DIFFICULT、CREATIVE、SUBTLE、COMBINE、TOOL USE、VERBOSE和CONCISE。这些基准通过不同的指令演化或转换现有问题，以测试LLMs在不同编程场景下的表现。EVOEVAL不仅提供了全面的基准，还可以用于进一步演化任意问题，以跟上LLMs和编程领域的不断变化。

The EVOEVAL dataset was developed by researchers at the University of Illinois Urbana-Champaign to create a comprehensive benchmark suite for evaluating the programming capabilities of Large Language Models (LLMs) through the evolution of existing programming benchmark problems. The dataset contains 828 problems spanning seven distinct benchmarks: DIFFICULT, CREATIVE, SUBTLE, COMBINE, TOOL USE, VERBOSE, and CONCISE. These benchmarks evolve or transform existing programming problems via varied instructions to test LLMs' performance across diverse programming scenarios. Beyond providing a comprehensive evaluation benchmark, EVOEVAL can also be utilized to further evolve arbitrary problems to keep pace with the ongoing developments in both LLMs and the programming domain.

提供机构：

伊利诺伊大学厄巴纳-香槟分校

创建时间：

2024-03-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集