replication-package-Analyzing-the-Impact-of-Years-of-Experience-on-Code-Quality-main
收藏DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/replication-package-Analyzing-the-Impact-of-Years-of-Experience-on-Code-Quality-main/28306316/1
下载链接
链接失效反馈官方服务:
资源简介:
Replication Package for the Study: "Aged to Perfection? Analyzing the Impact of Years of Experience on Code Quality"Replication Package StructurePrograms for Data Extraction and SummarizationThese programs extract and summarize information from sources such as GitHub, Workana, and SonarQube.<code><strong>scraper.py</strong></code>: Collects public Workana profiles relevant to the research scope. The results are saved in <code>workana_profiles.csv</code>.<code><strong>fetch_repos.py</strong></code>: Downloads up to five Git repositories for each developer listed in <code>workana_profiles.csv</code>.<code><strong>fetch_sonar_qube.py</strong></code>: Uses SonarQube to analyze repositories and generate reports for each repository folder.<code><strong>aggregator.py</strong></code>: Consolidates all SonarQube reports for a given developer into a unified report, which is stored in each developer’s folder.Helper ProgramsThese programs assist with miscellaneous tasks.<code><strong>other_tools_and_helpers/anonimize_workana_profile_names</strong></code>: Replaces names in the profiles with anonymous identifiers.<code><strong>other_tools_and_helpers/get_github_repo_links</strong></code>: Generates a list of all downloaded repositories for each developer. The results are saved in <code>github_repo_links.csv</code>.Analysis Programs<code><strong>analysis.py</strong></code>: Conducts statistical tests and descriptive analyses of the collected data. This program populates the <code>metrics-dataset</code> folder with summarized reports and creates the consolidated file <code>all_developer_metrics_workana_sonarqube.csv</code>.Data Files<code><strong>workana_profiles.csv</strong></code>: Contains the collected Workana profiles. To protect privacy, sensitive information has been redacted and replaced with <code>[REDACTED]</code>.<code><strong>github_repo_links.csv</strong></code>: Lists the GitHub repository links analyzed during the study.<code><strong>collected_repos_report.xlsx</strong></code>: Provides a summary of all collected repositories, including the programming languages used.<code><strong>metrics-dataset</strong></code><b> </b><b>folder</b>: Contains the summarized metrics extracted by SonarQube for each developer.<code><strong>all_developer_metrics_workana_sonarqube.csv</strong></code>: Aggregates the years of experience and SonarQube metrics for each developer. This file serves as the final dataset for statistical analysis.ExecutionTo replicate the study, follow these steps:<b>Extract Workana profiles</b>: Run <code>scraper.py</code>.<b>Collect GitHub repositories</b>: Execute <code>fetch_repos.py</code>, ensuring that a GitHub API key is configured.<b>Analyze repositories with SonarQube</b>: Use the latest version of SonarQube Community Edition (as of November 2024). Ensure that the server is running locally on port 9001.<b>Summarize metrics</b>: Run <code>aggregator.py</code> to consolidate the SonarQube analysis results.<b>Analyze the data</b>: Execute the cells in <code>analysis.ipynb</code> to perform the final statistical analysis.
本研究复现包:《臻于至善?剖析从业年限对代码质量的影响》
复现包结构
### 数据提取与汇总程序
此类程序用于从GitHub、Workana及SonarQube等数据源提取并汇总相关信息。
- `scraper.py`:采集符合本研究范围的公开Workana用户档案,结果保存至`workana_profiles.csv`。
- `fetch_repos.py`:为`workana_profiles.csv`中列出的每位开发者最多下载5个Git代码仓库。
- `fetch_sonar_qube.py`:使用SonarQube对代码仓库进行分析,并为每个仓库文件夹生成分析报告。
- `aggregator.py`:将某位开发者的所有SonarQube分析报告整合为一份统一报告,存储至该开发者专属文件夹中。
### 辅助程序
此类程序用于处理各类杂项任务:
- `other_tools_and_helpers/anonimize_workana_profile_names`:将用户档案中的姓名替换为匿名标识符。
- `other_tools_and_helpers/get_github_repo_links`:为每位开发者生成其下载的所有代码仓库的列表,结果保存至`github_repo_links.csv`。
### 分析程序
- `analysis.py`:对采集到的数据开展统计检验与描述性分析,该程序会将汇总报告写入`metrics-dataset`文件夹,并生成整合文件`all_developer_metrics_workana_sonarqube.csv`。
### 数据文件
- `workana_profiles.csv`:包含采集到的Workana用户档案。为保护隐私,敏感信息已被脱敏处理并替换为`[REDACTED]`。
- `github_repo_links.csv`:列出本研究中分析的GitHub代码仓库链接。
- `collected_repos_report.xlsx`:汇总所有采集到的代码仓库信息,包括其所使用的编程语言。
- `metrics-dataset`文件夹:包含SonarQube为每位开发者提取的汇总指标。
- `all_developer_metrics_workana_sonarqube.csv`:整合每位开发者的从业年限与SonarQube分析指标,该文件为本研究统计分析所用的最终数据集。
### 复现流程
若需复现本研究,请遵循以下步骤:
1. **采集Workana用户档案**:运行`scraper.py`。
2. **获取GitHub代码仓库**:执行`fetch_repos.py`,请确保已配置GitHub API密钥。
3. **使用SonarQube分析代码仓库**:使用截至2024年11月的最新版SonarQube社区版,并确保服务器在本地9001端口运行。
4. **汇总分析指标**:运行`aggregator.py`以整合SonarQube的分析结果。
5. **开展数据分析**:执行`analysis.ipynb`中的代码单元格,完成最终统计分析。
提供机构:
figshare
创建时间:
2025-01-29



