When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems - Replication Package
收藏Figshare2025-01-08 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_When_Code_Smells_Meet_ML_On_the_Lifecycle_of_ML-specific_Code_Smells_in_ML-enabled_Systems_b_-_Replication_Package/28167065
下载链接
链接失效反馈官方服务:
资源简介:
Replication Package of the article: When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled SystemsIn this package you can find our online technical report with more details for each Machine Learning-Specific Code Smell and our replication package with data, script and our tool: CodeSmile. More details in the included README.md Abstract:The adoption of Machine Learning (ML)--enabled systems is steadily increasing.Nevertheless, there is a shortage of ML-specific quality assurance approaches, possibly because of the limited knowledge of how quality-related concerns emerge and evolve in ML-enabled systems.In this paper, we aim to investigate the emergence and evolution of specific types of quality-related concerns known as ML-specific code smells, \ie sub-optimal implementation solutions applied on ML pipelines that may significantly decrease the quality and maintainability of ML-enabled systems. More specifically, it presents a study of ML-specific code smells by empirically analyzing (i) their prevalence in real ML-enabled systems, (ii) how they are introduced and removed, and (iii) their survivability.Therefore, we provide an exploratory study, mining a large dataset of ML-enabled systems and analyzing over 400k commits about \numProjectsSampling projects. We tracked and inspected the introduction and evolution of ML smells through \CodeSmile, a novel ML smell detector that we built to enable our investigation and to detect ML-specific code smells.Our results reveal that: 1) \CodeSmile can detect ML-CSs with 0.87\% and 0.78\% of precision and recall, respectively. 2) ML-CSs are often introduced during the file modification during new feature tasks. 3) Smells are typically removed due to new features, enhancement, or refactoring tasks. And, lastly, the majority of ML-CSs are resolved within the first 10\% of commits.
创建时间:
2025-01-08



