How Do Developers Document Technical Debt in Competitive Data Science? A Large-Scale Study of Kaggle Notebooks
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20032196
下载链接
链接失效反馈官方服务:
资源简介:
Self-Admitted Technical Debt (SATD) represents explicit developer acknowledgments of temporary or suboptimal solutions in software artifacts. While SATD has been extensively studied in traditional software and production machine learning systems, its presence in competitive data science environments remains underexplored. In this paper, we present a large-scale empirical study of SATD in Kaggle competition notebooks, a setting characterized by time constraints, rapid experimentation, and performance-driven development. We analyze 8,751 notebooks across tabular, computer vision, and natural language processing competitions, identifying 440 SATD instances in 299 notebooks through a manual classification process based on a nine-category taxonomy for ML systems. Our results show that SATD is relatively rare in this context and is predominantly associated with data-related concerns and developer awareness. Furthermore, we find no significant correlation between SATD and notebook popularity. We also observe distinct SATD patterns across competition domains.
提供机构:
Zenodo
创建时间:
2026-05-05



