five

Class Noise in Two Software Engineering Data-Sets

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7527589
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains two software engineering data-sets of historical code changes and their corresponding bag-of-words representations. Each data-set (build and core review data) is extracted from open source Java projects. The directories are structured as follows: 1- In the 'code review data' directory, you can find 5 subdirectories. The subdirectory 'raw_lines_of_code' contains historical code changes of 16 Java-based projects and the sentiment of code review comments for each line of code (class_value).  The subdirectory 'before_noise_handling' contains the bag-of-words representations of the projects data in the raw_lines_of_code directory. Inside each project folder are the training and testing data, generated using a 10-fold cross validation scheme. The subdirectory 'after_CF' contains a curated version of the files found in the ''before_noise_handling' using the consensus filter algorithm for class noise handling. The subdirectory 'after_MF' contains a curated version of the files found in the ''before_noise_handling' using the majority filter algorithm for class noise handling. The subdirectory 'after_DB' contains a curated version of the files found in the ''before_noise_handling' using the DB algorithm for class noise handling. 2- In the 'build data' directory, you can find 3 subdirectories. The subdirectory 'after_CF' contains a curated version of the files in the the original data using the consensus filter algorithm for class noise handling. The subdirectory 'after_MF' contains a curated version of the files found in the the original data using the majority filter algorithm for class noise handling. The subdirectory 'after_DB' contains a curated version of the files found in the the original data using the DB algorithm for class noise handling.
创建时间:
2023-01-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作