CleanCodeReview
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13828618
下载链接
链接失效反馈官方服务:
资源简介:
To investigate the problem of classifying source code reviews, we have created a dataset suitable for evaluating and testing various methods for solving this problem. We combined four open datasets, and manually marked up 3200 code review comments. We have created our own classification based on the available datasets. The final dataset contains 10045 comments and is divided into 16 classes and 5 groups for hierarchical classification.
The dataset contains comment classes such as:
Style - readability, code layout, indentation issues, and other common programming conventions
Naming - uniform style and convenience of naming variables, methods, classes
Questioning - questions to the author of the code, requests clarification of the code or examples of use
Response - appointment of other reviewers, writes acknowledgements, agreements with others, additions to the developer's opinion
Convention - discussion of the software development process
Testing - requests tests to verify the functionality of the code
Design - architecture and code design, program structure control
Refactoring - logical structure, object creation, logical errors
Functionality - identification of code defects
Roadmap - further development of the program
Optimization - code optimization, parallelism, synchronization
Error - identifies problems with exception and error handling
Documentation - problems with documentation or comments in the source code
Support - compatibility with other systems, support systems
Input/Output - input/output in the graphical user interface, problems with pop-up windows.
Other - comments that do not carry a semantic load without context
Union of classes:
Code style (Style, Naming)
Discussion (Questioning, Response, Convention, Testing)
Development (Design, Refactoring, Functionality, Roadmap, Optimization, Error)
User (Documentation, Support, Input/Output)
Other (Other)
创建时间:
2024-09-23



