Class Noise in Two Software Engineering Data-Sets
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7527589
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains two software engineering data-sets of historical code changes and their corresponding bag-of-words representations. Each data-set (build and core review data) is extracted from open source Java projects. The directories are structured as follows:
1- In the 'code review data' directory, you can find 5 subdirectories.
The subdirectory 'raw_lines_of_code' contains historical code changes of 16 Java-based projects and the sentiment of code review comments for each line of code (class_value).
The subdirectory 'before_noise_handling' contains the bag-of-words representations of the projects data in the raw_lines_of_code directory. Inside each project folder are the training and testing data, generated using a 10-fold cross validation scheme.
The subdirectory 'after_CF' contains a curated version of the files found in the ''before_noise_handling' using the consensus filter algorithm for class noise handling.
The subdirectory 'after_MF' contains a curated version of the files found in the ''before_noise_handling' using the majority filter algorithm for class noise handling.
The subdirectory 'after_DB' contains a curated version of the files found in the ''before_noise_handling' using the DB algorithm for class noise handling.
2- In the 'build data' directory, you can find 3 subdirectories.
The subdirectory 'after_CF' contains a curated version of the files in the the original data using the consensus filter algorithm for class noise handling.
The subdirectory 'after_MF' contains a curated version of the files found in the the original data using the majority filter algorithm for class noise handling.
The subdirectory 'after_DB' contains a curated version of the files found in the the original data using the DB algorithm for class noise handling.
创建时间:
2023-01-12



