five

Self-Admitted Technical Debt in Scientific Software

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13174321
下载链接
链接失效反馈
官方服务:
资源简介:
# TitleSelf-Admitted Technical Debt (SATD) in Scientific Software Projects # Description## AbstractThis dataset contains annotated code comments from nine open-source scientific software projects: Astropy, Biopython, CESM, Firedrake, MOOSE, GROMACS, Elmer, Athena, and Root. The comments are labeled to identify instances of Self-Admitted Technical Debt (SATD), with a focus on a novel category termed Scientific Debt (SD). The dataset supports research on the nature and management of technical debt in scientific software. ## PurposeThe dataset was created to explore the prevalence and characteristics of SATD in scientific software, with the aim of improving software maintainability and scientific validity. ## ContentThe dataset includes over 28,680 annotated code comments, with labels indicating various types of technical debt such as Code Debt, Design Debt, and Scientific Debt. Each comment is accompanied by metadata including the project name, file path, comment introduction date, and comment removal date. ## ScopeThe dataset covers nine projects across different scientific domains, including astronomy, molecular biology, and climate modeling. Data was collected from publicly available repositories and spans from the inception of each project to the present. ## MethodologyData was extracted using GitPython to access the version control histories of the selected projects. Comments were manually labeled for SATD, with a focus on identifying Scientific Debt indicators such as assumptions, missing edge cases, computational inaccuracies, translation challenges, and new scientific findings. ## Usage NotesThis dataset can be used for research on technical debt management, software maintenance, and scientific software development. Users should have a basic understanding of programming and version control systems. Recommended tools for analysis include Python and Pandas. ## Ethical ConsiderationsAll data was collected from publicly available sources. No personal or sensitive information is included. # Technical Details## File Formats- CSV: Contains the annotated comments and metadata ## Size- Number of records: 28,680- Total file size: 15MB ## Version- Version 1.0, July 2024 # Access and Use## AccessThe dataset can be downloaded from Zenodo: [Zenodo Link](https://doi.org/10.5281/zenodo.13174322) ## LicenseThis dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
创建时间:
2024-08-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作