five

1,237 Annotated Developer Apologies from GitHub

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10079440
下载链接
链接失效反馈
官方服务:
资源简介:
Software Developer Apologies This dataset contains 1,237 GitHub comments with apology annotations (apology vs not apology), released as part of the following publication: Benjamin S. Meyers. Human Error Assessment in Software Engineering. Rochester Institute of Technology. 2023.  Included Files The "github_apologies.csv" file contains the full dataset of 1,237 GitHub comments with apology annotations. In total, there are 365 comments containing an apology (872 non apologies). The comments themselves are a subset of those included in 88.6 Million Developer Comments from GitHub. Annotation Details Full details are provided in the above publication. We implemented a naive classifier (Precision: 41.7%, Recall: 99.7%, F1: 86.9%, Accuracy: 91.1%) using counts of apology lemmas. 91% of developer comments containing at least one apology lemma matched our manual annotations. Agreement between raters was almost perfect (Cohen's Kappa = 0.94). CSV Fields ID: Unique identifier for the comment. SOURCE: Whether this comment originates from a commit, issue, or pull request. COMMENT_URL: The URL linking to the comment. COMMENT_TEXT: The raw comment text. NUM_APOLOGY_LEMMAS: The count of apology lemmas present in the comment. CLASSIFIER_LABEL: The automatically assigned label ("Apology" or "Not Apology"). RATER_1_LABEL: The manually assigned label ("Apology" or "Not Apology") from Rater 1. RATER_2_LABEL: The manually assigned label ("Apology" or "Not Apology") from Rater 2. AGREED_LABEL: The agreed upon label ("Apology" or "Not Apology") after Rater 1 and Rater 12 resolved disagreements. Contact Please contact Benjamin S. Meyers (email) with questions about this data and its collection. Acknowledgments Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).
创建时间:
2024-01-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作