five

Exploring Design Smells for Smell-Based Defect Prediction

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4103860
下载链接
链接失效反馈
官方服务:
资源简介:
The archived file datasets.zip includes the datasets used for supporting the conclusions in the article Exploring Design Smells for Smell-Based Defect Prediction. In this paper, we answer two research questions: RQ1. Do Design code smells contribute to the performance of defect prediction models trained with Traditional code smells? RQ2. How do the different categories of Design smells impact the performance of the defect prediction models? Therefore, after extracting the archived file documents, you will find two sub-directories, respectively named "RQ1" and "RQ2". They include the results obtained for each one of the research questions, thus supporting our conclusions. (You will also find a README.pdf file with these same instructions regarding the datasets.) Inside "RQ1," you will find two directories, respectively named "configuration_1" and "configuration_2". They represent the different configurations for the experiments. "configuration_1" contains the datasets with results for the ten classifiers configurations with the highest scores and "configuration_2" contains the datasets with the results classifier configuration with the overall best results - Support Vector Machine with C=0.1. Furthermore, within each directory, there are three sub-directories, respectively named "designite," "designite_traditional," and "traditional." These have the datasets for each of the considered smell sets in our study. Inside "RQ2," you will find four directories. Each corresponds to a category from the design smells for the dataset "designite_traditional." These datasets were build from the same configuration as "configuration_2". Then, within every directory, there are 97 sub-directories representing the 97 projects analyzed in this study. Every project folder follows the same structure, which we define as follows. The "dataset" directory contains the original training and testing dataset used. The "oversamples" directory contains the training dataset after oversampling for each of the feature selection approaches. The "score_summary" directory contains all classifier configurations considered, not only the 10 with the highest scores. The "scores.csv" file contains all the scores for the main classifier configurations studied in the particular experiment. The "selected_features" directory contains the selected features' information and the selected features dataset for each feature_selection method. The "selected_testing_X" directory contains the testing datasets. The "top_scores_summary" directory contains the classifier configurations and hyper-parameter scores for the top 10 highest scores.
创建时间:
2021-02-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作