Defects4C: Benchmarking Large Language Model Repair Capability with C/C++ Bugs
收藏Figshare2025-11-03 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Defects4C_Benchmarking_Large_Language_Model_Repair_Capability_with_C_C_Bugs/30514943/1
下载链接
链接失效反馈官方服务:
资源简介:
Automated Program Repair (APR) plays a critical role in enhancing the quality and reliability of software systems. While substantial progress has been made in Java-based APR, largely facilitated by benchmarks like Defects4J, there remains a significant gap in research on C/C++ program repair, despite the widespread use of C/C++ and the prevalence of associated vulnerabilities. This gap is primarily due to the lack of high-quality, open-source benchmarks tailored for C/C++. To address this issue, we introduce Defects4C, a comprehensive and executable benchmark specifically designed for C/C++ program repair. Our dataset is constructed from real-world C/C++ repositories and includes a large collection of bug-relevant commits (9M in total), 248 high-quality buggy functions, and 102 vulnerable functions, all paired with test cases for reproduction. These resources enable rigorous evaluation of repair techniques and support the retraining of learning-based approaches for enhanced performance. Using Defects4C, we conduct a comprehensive empirical study evaluating the effectiveness of 24 state-of-the-art large language models (LLMs) in repairing C/C++ faults. Our findings offer valuable insights into the strengths and limitations of current LLM-based APR techniques in this domain, highlighting both the need for more robust methods and the critical role of Defects4C in advancing future research
自动程序修复(Automated Program Repair, APR)对于提升软件系统的质量与可靠性具有至关重要的作用。尽管基于Java的自动程序修复已取得长足进展,这在很大程度上得益于Defects4J等基准测试集的推动,但针对C/C++程序修复的研究仍存在显著缺口——尽管C/C++语言应用广泛且相关漏洞频发。这一缺口的核心原因在于缺乏针对C/C++量身定制的高质量开源基准测试集。为解决这一问题,我们推出了Defects4C:一款专为C/C++程序修复打造的全面且可执行的基准测试集。本数据集源自真实世界的C/C++代码仓库,包含总计900万条与缺陷相关的提交、248个高质量有缺陷函数以及102个存在漏洞的函数,所有样本均配套了可用于复现缺陷的测试用例。这些资源可为修复技术的严谨评估提供支撑,同时也能用于学习型修复方法的再训练以提升其性能。借助Defects4C,我们开展了一项全面的实证研究,评估了24款当前最优的大语言模型(Large Language Model, LLM)修复C/C++程序缺陷的效果。我们的研究结果为当前基于大语言模型的自动程序修复技术在该领域的优势与局限提供了宝贵见解,既指出了开发更鲁棒方法的必要性,也凸显了Defects4C在推动未来相关研究发展中的关键作用。
提供机构:
bowrl, jorn
创建时间:
2025-11-03



