five

How Do Developers Document Technical Debt in Competitive Data Science? A Large-Scale Study of Kaggle Notebooks

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20032196
下载链接
链接失效反馈
官方服务:
资源简介:
Self-Admitted Technical Debt (SATD) represents explicit developer acknowledgments of temporary or suboptimal solutions in software artifacts. While SATD has been extensively studied in traditional software and production machine learning systems, its presence in competitive data science environments remains underexplored. In this paper, we present a large-scale empirical study of SATD in Kaggle competition notebooks, a setting characterized by time constraints, rapid experimentation, and performance-driven development. We analyze 8,751 notebooks across tabular, computer vision, and natural language processing competitions, identifying 440 SATD instances in 299 notebooks through a manual classification process based on a nine-category taxonomy for ML systems. Our results show that SATD is relatively rare in this context and is predominantly associated with data-related concerns and developer awareness. Furthermore, we find no significant correlation between SATD and notebook popularity. We also observe distinct SATD patterns across competition domains.
提供机构:
Zenodo
创建时间:
2026-05-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作