RepoDebug

Name: RepoDebug
Creator: 北京航空航天大学计算机科学与工程学院,华东师范大学,北京理工大学,百度公司
Published: 2025-09-04 18:13:21
License: 暂无描述

arXiv2025-09-04 更新2025-09-06 收录

下载链接：

https:// 'https: ree-sitter

下载链接

链接失效反馈

官方服务：

资源简介：

RepoDebug是一个多任务和多语言的代码调试数据集，涵盖了8种常用编程语言和3种调试任务。该数据集由北京航空航天大学计算机科学与工程学院、华东师范大学、北京理工大学和百度公司的研究人员共同构建，旨在评估大型语言模型在代码调试方面的能力。数据集包含22种不同的错误类型，分为4种主要类型：语法错误、引用错误、逻辑错误和多重错误。每个实例都包含一个有错误的代码文件、错误类型和错误位置的详细信息。RepoDebug数据集可用于评估LLMs在识别、定位和修复代码错误方面的能力，并为LLMs在代码调试领域的进一步研究提供支持。

RepoDebug is a multi-task and multilingual code debugging dataset that covers 8 widely used programming languages and 3 debugging tasks. This dataset was jointly constructed by researchers from the School of Computer Science and Engineering, Beihang University, East China Normal University, Beijing Institute of Technology, and Baidu Inc., aiming to evaluate the code debugging capabilities of large language models (LLMs). The dataset includes 22 distinct error types, which are categorized into four main types: syntax errors, reference errors, logical errors, and multiple errors. Each instance contains a buggy code file, along with detailed information about the error type and error location. The RepoDebug dataset can be used to assess the abilities of LLMs to identify, locate, and fix code errors, and provides support for further research on LLMs in the field of code debugging.

提供机构：

北京航空航天大学计算机科学与工程学院,华东师范大学,北京理工大学,百度公司

创建时间：

2025-09-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集