five

Replication Materials for: Engineering Formality and Software Risk in Debian Python Packages

收藏
DataONE2024-01-10 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:b91458ee2ef0ed7720d45a7b93471785f9214829e02923b1deeb36f6d143aa94
下载链接
链接失效反馈
官方服务:
资源简介:
These materials were produced as part of: Gaughan, Matthew, Champion, Kaylea, and & Hwang, Sohyeon. (2024) \"Engineering Formality and Software Risk in Debian Python Packages.\" 31st IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER2024). And includes data initially produced in: Champion, Kaylea; Hill, Benjamin Mako, 2021, \"Replication data and online supplement for: Underproduction: An Approach for Measuring Risk in Open Source Software\", https://doi.org/10.7910/DVN/PUCD2P, Harvard Dataverse, V2 In this archive, you'll find: inst_all_packages_full_results.tab Summary data for all Debian packages as they appear in Champion and Hill (2021). mmt_data_final.csv Summary data for all Python-language Debian packages as they appear in the paper. This data set is novel, and includes package age in days, Github milestone usage, two different calculations of mean membership type (MMT), and package name. calculatePower.R Contains R code to reproduce linear regression and power analysis methods as they appear in the paper. For more information, please contact: Matt Gaughan (he/him) gaughan@u.northwestern.edu Abstract: While Free/Libre and Open Source Software (FLOSS) is critical to global computing infrastructure, the maintenance of widely-adopted FLOSS packages is dependent on volunteer developers who select their own tasks. Risk of failure due to the misalignment of engineering supply and demand --- known as underproduction --- has led to code base decay and subsequent cybersecurity incidents such as the Heartbleed and Log4Shell vulnerabilities. FLOSS projects are self-organizing but can often expand into larger, more formal efforts. Although some prior work suggests that becoming a more formal organization decreases project risk, other work suggests that formalization may in fact increase the likelihood of project abandonment. We evaluate the relationship between underproduction and formality, focusing on formal structure, developer responsibility, and work processes management. We analyze 182 GNU/Linux packages made available via the Debian distribution and find that although more formal structures are associated with higher risk of underproduction, more elevated developer responsibility is associated with less underproduction while the relationship between formal work process management and underproduction is not statistically significant. Our analysis suggests that a FLOSS organization's transformation into a more formal structure may face unintended consequences which must be carefully managed.

本数据集相关材料源自以下学术成果:Gaughan, Matthew、Champion, Kaylea 与 Hwang, Sohyeon 于2024年发表的《Debian Python软件包中的工程规范性与软件风险》,收录于第31届IEEE软件分析、演化与再工程国际会议(SANER2024)。 本数据集同时包含源自以下文献的原始数据:Champion, Kaylea 与 Hill, Benjamin Mako 于2021年发布的《<开源软件风险度量方法:不足生产>配套复制数据与在线补充材料》,相关DOI为10.7910/DVN/PUCD2P,存储于哈佛数据文库(Harvard Dataverse)版本2中。 本存档文件包含以下内容: 1. inst_all_packages_full_results.tab:对应Champion与Hill(2021)论文中所有Debian软件包的汇总统计数据。 2. mmt_data_final.csv:对应上述论文中所有Python语言Debian软件包的汇总统计数据。 本数据集具有创新性,涵盖了软件包创建时长(以日为单位)、GitHub里程碑使用情况、两种不同的平均成员类型(mean membership type,MMT)计算结果,以及软件包名称。 calculatePower.R:包含可复现论文中线性回归与功效分析方法的R语言代码。 如需获取更多信息,请联系:Matt Gaughan(男性代词:he/him),邮箱:gaugan@u.northwestern.edu 摘要:尽管自由/开源软件(Free/Libre and Open Source Software,简称FLOSS)是全球计算基础设施的核心组成部分,但广泛使用的开源软件包的维护工作依赖于自主选择任务的志愿开发者。由工程供需错配(即所谓的‘生产不足’,underproduction)引发的失效风险,已导致代码库老化,并催生了如Heartbleed与Log4Shell漏洞等后续网络安全事件。 开源软件项目通常具备自组织特性,但往往会发展为规模更大、规范性更强的运作模式。尽管此前部分研究认为,提升组织规范性可降低项目风险,但另有研究指出,正规化进程实际上可能提高项目被遗弃的概率。本研究聚焦于组织架构、开发者职责与工作流程管理三个维度,评估生产不足与规范性之间的关联。 本研究对Debian发行版(Debian)中收录的182个GNU/Linux软件包进行分析,结果显示:尽管更完善的组织架构与更高的生产不足风险存在关联,但更高程度的开发者职责与更低的生产不足风险相关,而正规化工作流程管理与生产不足之间的关联未达到统计学显著性水平。 本研究分析表明,自由/开源软件组织向更正规化架构转型可能面临难以预见的后果,需加以审慎管理。
创建时间:
2024-03-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作