five

BugCatcher-Data

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14536732
下载链接
链接失效反馈
官方服务:
资源简介:
Bug datasets play a vital role in advancing software engineering tasks, including bug detection, fault localization, and automated program repair. These datasets enable the development of more accurate algorithms, facilitate efficient fault identification, and drive the creation of reliable automated repair tools. However, the manual collection and curation of such data are labor-intensive and prone to inconsistency, which limits scalability and reliability. Current datasets often fail to provide detailed and accurate information, particularly regarding bug types, descriptions, and classifications, reducing their utility in diverse research and practical applications. To address these challenges, we introduce BugCatcher, a comprehensive approach for constructing large-scale, high-quality bug datasets. BugCatcher begins by enhancing PR-Issue linking mechanisms, extending data collection to 12 programming languages over a decade, and ensuring accurate linkage between pull requests and issues. It employs a two-stage filtering process, BugCurator, to refine data quality, and utilizes large language models with Zero-shot Chain-of-Thought prompting to generate precise bug types and detailed descriptions. Furthermore, BugCatcher incorporates a robust classification framework, fine-tuning models for improved categorization. The resulting dataset, BugCatcher-Data, includes 243,265 bug-fix entries with comprehensive fields such as code diffs, bug locations, detailed descriptions, and classifications, serving as a substantial resource for advancing software engineering research and practices.
创建时间:
2024-12-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作