200 Annotated Developer Human Errors from GitHub
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10080448
下载链接
链接失效反馈官方服务:
资源简介:
Software Engineers' Human Errors
This dataset contains 200 GitHub comments with manual human error annotations, released as part of the following publication:
Benjamin S. Meyers. Human Error Assessment in Software Engineering. Rochester Institute of Technology. 2023.
Included Files
The "developer_human_errors.csv" file contains the full dataset of 200 software defect descriptions annotated with human error types (slips, lapses, mistakes) and T.H.E.S.E. categories.
CSV Fields
ID: Unique identifier for the comment.
SOURCE: Whether this comment originates from a commit, issue, or pull request.
COMMENT_URL: The URL linking to the comment.
COMMENT_TEXT: The raw comment text.
HUMAN_ERROR_TYPE: Whether the software defect described is a slip, lapse, or mistake.
THESE_V4_ID: Manually assigned T.H.E.S.E. category with labels corresponding to Version 4 of T.H.E.S.E.
THESE_NAME: Name corresponding to manually assigned T.H.E.S.E. category.
Annotation Details
Human error types span slips, lapses, and mistakes from James Reason's Generic Error Modelling System (GEMS):
Slips: Failures of attention.
Lapses: Failures of memory.
Mistakes: Failures of planning.
T.H.E.S.E. categories are summarized below:
S01: Typos & Misspellings
S02: Syntax Errors
S03: Overlooking documented Information
S04: Multitasking Errors
S05: Hardware Interaction Errors
S06: Overlooking Proposed Code Changes
S07: Overlooking Existing Functionality
S08: General Attentional Failure
L01: Forgetting to Finish a Development Task
L02: Forgetting to Fix a Defect
L03: Forgetting to Remove Development Artifacts
L04: Working with Outdated Source Code
L05: Forgetting an Import Statement
L06: Forgetting to Save Work
L07: Forgetting Previous Development Discussion
L08: General Memory Failure
M01: Code Logic Errors
M02: Incomplete Domain Knowledge
M03: Wrong Assumption Errors
M04: Internal Communication Errors
M05: External Communication Errors
M06: Solution Choice Errors
M07: Time Management Errors
M08: Inadequate Testing
M09: Incorrect/Insufficient Configuration
M10: Code Complexity Errors
M11: Internationalization/String Encoding Errors
M12: Inadequate Experience Errors
M13: Insufficient Tooling Access Errors
M14: Workflow Order Errors
M15: General Planning Failure
Contact
Please contact Benjamin S. Meyers (email) with questions about this data and its collection.
Acknowledgments
Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).
创建时间:
2024-01-04



