CodeApex

arXiv2024-03-11 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2309.01940v4

下载链接

链接失效反馈

官方服务：

资源简介：

专注于大型语言模型编程理解、代码生成和代码修正能力的双语基准数据集。编程理解任务测试LLMs在涵盖概念理解、常识推理和多跳推理的多项选择考试问题上的表现。代码生成任务通过根据提供的描述和原型完成C++函数来评估LLMs。代码修正任务要求LLMs修复具有不同错误消息的真实世界错误代码段。

This is a bilingual benchmark dataset focused on the programming understanding, code generation, and code correction capabilities of large language models (LLMs). The programming understanding task evaluates LLMs' performance on multiple-choice exam questions covering conceptual comprehension, commonsense reasoning, and multi-hop reasoning. The code generation task assesses LLMs by having them complete C++ functions based on provided descriptions and prototypes. The code correction task requires LLMs to fix real-world erroneous code snippets accompanied by different error messages.

创建时间：

2023-09-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集