five

Self-Admitted Technical Debt in Commit Messages: Comparing Java, Python, and R

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12761214
下载链接
链接失效反馈
官方服务:
资源简介:
The folder organization and datasets within each are as follows: Collection Folder: the original dataset that we scraped is placed. We have removed the user names and email addresses to keep the users’ privacy. RQ1 Folder has three subfolders: ❖     Manual Training: The initial manually labeled data we used to initially train the classifiers is included. Note that columns A-O in this dataset are all extracted from GitHub’s API. Column O (heading “message”) is the commit message itself. The following columns P and Q (heading “author_a” and “author_b”) are the final classification (upon which the Cohen Kappa was calculated). Column R (heading “notes”) contains some commentaries on specific cases that may be meaningful. ❖     Predicted: The results of the automatic classifiers (both 1st and 2nd round) are included. The additional columns are generated by the classifiers. ❖     Verifications contain the manually labeled data that we used as 1st and 2nd verification rounds. This is a simplified dataset with the commit’s sha and the parsed message. The authors classified columns E and F independently and individually. The labels stated here are those that the authors agreed to (without having access to column D). Note that column D was added afterward by sha-matching by another author to calculate the Cohen Kappa. The yellow rows are those with disagreements.  RQ2_RQ3 Folder contains the manually labeled dataset for RQ2 and RQ3 (SATD Types and Activities).  NOTE: Kindly note that many messages or classifications are multiline. This means that the cells have to be expanded to be capable of reading all text included in a cell.
创建时间:
2024-12-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作