five

Project Throughput Deliverd

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/project-throughput-deliverd
下载链接
链接失效反馈
官方服务:
资源简介:
 Data and Analysis Code for \Proactive Detection of Software Forecast Degradation Through Statistical Process Control\ Abstract This dataset contains processed metrics, analysis scripts, and supplementary materials supporting the research paper \Proactive Detection of Software Forecast Degradation Through Statistical Process Control\ published in IEEE Transactions on Software Engineering. Dataset Contents: The dataset comprises comprehensive Statistical Process Control (SPC) diagnostics and forecasting accuracy metrics for 1,204 real-world software projects  spanning 11 business domains. Data includes: - Processed Metrics File (unified_results_expanded_20251112_150856.xlsx): Project-level aggregated data containing 117 variables including SPC violation counts (8 Western Electric rules, CUSUM, coefficient of variation), forecast accuracy metrics (MAE from ML ensemble and Monte Carlo simulation), project characteristics (duration, domain, size), and risk stratification indicators. Each row represents one complete software project with median duration of 163 weeks. - Analysis Script (spc_analysis_for_paper.py): Python code implementing all statistical analyses reported in the paper, including correlation analysis (Pearson and Spearman), ROC curve optimization, feature importance ranking via Random Forest, and stratified validation across risk levels, domains, and project sizes. Running this script reproduces all quantitative results and generates publication-ready figures. - Generated Figures (5 PNG files, 300 DPI): Publication-quality visualizations including scatter plots with regression lines, feature importance rankings, ROC curves with optimal thresholds, box plots by risk stratification, and comparative degradation analysis. - Documentation (README.md): Detailed instructions for reproducing all analyses, system requirements, and variable definitions. Data Provenance: Raw throughput data were obtained from the Public Jira Dataset (Montgomery et al., 2025, DOI: 10.5281\/zenodo.15719919), which aggregates 1,822 open-source projects from 16 public Jira repositories. We applied rigorous filtering criteria (minimum 20 weeks duration, complete throughput records) resulting in 1,204 projects suitable for longitudinal SPC analysis. SPC diagnostics were computed using established Western Electric rules and statistical control methods. Forecast accuracy was assessed using a validated ensemble of 24 machine learning models with time-series cross-validation. Key Findings Supported: - Strong correlation between SPC violations and forecast degradation (Spearman \u03c1 = 0.651, p < 0.001) - 334% increased MAE for high-violation projects versus stable processes - Optimal intervention threshold of 74 violations (ROC AUC = 0.820) - Robust generalization across risk levels, domains, and project sizes Reproducibility: All analyses are fully reproducible using the provided Python script with standard scientific libraries (pandas, numpy, scipy, scikit-learn, matplotlib, seaborn). The dataset enables researchers to validate our findings, extend analyses to additional research questions, or apply SPC-based forecasting quality assessment to new project contexts. Applications: This dataset supports research in software project management, forecast quality assessment, process stability monitoring, early warning system development, and empirical software engineering. Practitioners can use the validated thresholds and diagnostic approaches to implement SPC-based forecast  reliability monitoring in their organizations. Ethical Considerations: All data derive from publicly available open-source repositories with no personally identifiable information. Data processing adheres to community standards for empirical software engineering research. Related Publication: R. A. de Oliveira, J. de P. Ribeiro, and E. E. Scalabrin, \Proactive Detection of Software Forecast Degradation Through Statistical Process Control,\ IEEE Transactions on Software Engineering, vol. XX, no. X, 2025. Keywords: Statistical process control, software forecasting, throughput prediction, Agile methods, machine learning, Monte Carlo simulation, forecast quality, early warning systems, empirical software engineering --- Metadata Suggestions Categories: - Software Engineering - Data Analytics - Machine Learning - Quality Assurance File Format: XLSX (Excel), Python (.py), PNG (images), Markdown (.md) License: CC BY 4.0 (Creative Commons Attribution) Size: ~25 MB (compressed ZIP) Programming Language: Python 3.8+ Dependencies: pandas, numpy, scipy, scikit-learn, matplotlib, seaborn, openpyxl 
提供机构:
Rodrigo Almeida de Oliveira
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作