five

When Tools Overlook Domain Knowledge: An Empirical Study of Refactoring in Scientific Software

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15030237
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract Refactoring is a critical process for improving code quality, but anecdotal evidence has shown that refactoring in scientific software (Sci-SW ) is not always feasible. The inherently exploratory nature of Sci-OSS development, characterized by evolving requirements and limited adoption of traditional software engineering practices, could present significant challenges to refactoring. However, there is no systematic study exploring refactoring practices in Sci-OSS. To bridge this gap, we explore the effectiveness of three state-of-the-art refactoring detection tools: RefDiff (C), RefactoringMiner (Java), and PyRef (Python) to detect refactorings in scientific open-source software (Sci-OSS). Our findings reveal that these tools have significant limitations, detecting fewer refactorings in Sci-OSS than non-scientific OSS (Non-Sci-OSS). Through a mixed-method approach, we identified seven new types of refactorings in Sci-OSS, with 67.54% of undetected refactorings requiring domain knowledge. To complement our analysis of the refactoring code changes, we conducted surveys with 47 practitioners experienced in refactoring Sci-OSS and 14 follow-up interviews to gain deeper insights into the associated challenges. Our results revealed seven novel challenges for Sci-OSS-refactoring, including a domain knowledge gap. These findings emphasize the necessity for specialized tools and strategies to support refactoring in Sci-OSS effectively Replication instructions The project is written in Python 3.12.0 The requirements file contains all the necessary packages to run the project.To install the required packages, run the following command: pip install -r requirements.txt Once the required packages are installed,  To download refactoring data from GitHub,      1. Add your GitHub API token in RQ1/scripts/mysettings.py    2. Run the download_data_from_github.py file in the RQ1/scripts folder. This should download all the required files necessary to the data folder from GitHub Once you have all the data, then install the refactoring detection tools:The installation instructions are listed below:     - For PyRef: https://github.com/PyRef/PyRef    - For RefactoringMiner3.0: https://github.com/tsantalis/RefactoringMiner    - For RefDiff2.0: https://github.com/aserg-ufmg/RefDiff Once you have the refactoring detection tools installed. Please change the execution path in the respective file:For example, change the pyref execution path in pyref_script.py file in RQ1/scripts folder. Once you have changed the execution paths, you should be able to run the refactoring detection tool on all the collected data. The detection tools will create a JSON file with the list of detected refactorings for every refactoring instance with the name format as 'repo_name_issue_number'. you can now run the scripts in the RQ1/scripts folder to replicate the results.
创建时间:
2025-03-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作