"Refactoring Multi-Label Dataset"
收藏DataCite Commons2025-06-29 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/refactoring-multi-label-dataset
下载链接
链接失效反馈官方服务:
资源简介:
"Refactoring is the process of restructuring existing source code to improve its internal structure without altering its external behavior. Refactoring is essential to maintaining software quality; however, its manual application is labor-intensive, and existing automated techniques often fall short by relying on binary classification, neglecting the co-occurrence of multiple refactoring needs. This study addresses this gap by proposing and evaluating multi-label machine learning models capable of predicting combinations from 20 distinct refactoring operations across class, method, and variable granularities. We systematically investigate three multi-label learning strategies (Label Powerset, Classifier Chains, and Binary Relevance) integrated with five base classifiers: Random Forest, Gradient Boosting, XGBoost, Decision Tree, and Artificial Neural Network. Experiments are conducted using 10-fold cross-validation on a real-world dataset, with relevant feature selection techniques applied. Results indicate that variable-level metrics yield the highest predictive performance, with the Label Powerset strategy combined with Random Forest achieving a Jaccard accuracy of 95.30%. These findings highlight the efficacy of multi-label learning in modeling complex, real-world refactoring scenarios, providing a robust foundation for enhancing automated refactoring tools and advancing software maintenance practices."
提供机构:
IEEE DataPort
创建时间:
2025-06-29



