five

Replication data and online supplement for: Underproduction: An Approach for Measuring Risk in Open Source Software

收藏
DataONE2021-06-16 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:41ea501edd8cccd13b928d355e45ded15229446e8f849039774dc2224be4de64
下载链接
链接失效反馈
官方服务:
资源简介:
These materials were produced as part of: Champion, Kaylea and Benjamin Mako Hill. (2021) \"Underproduction: An approach for measuring risk in open source software.'' 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Preprint: https://arxiv.org/abs/2103.00352. DOI: 10.1109/SANER50967.2021.00043 In this archive, you'll find: inst_all_packages_full_results.tab Summary data on all packages as they appear in the paper. This is the place to look if you want to examine the underproduction factor associated with each package inst_all_packages_full_results-DESCRIPTION.txt A description of the fields in the inst_all_packages_full_results.tab file. R_Code.tar.gz Containing R code to reproduce figures and tables from fitted Bayesian hierarchical survival models: dfPrep.R, used to create datasets_for_modeling.RData models.R, a resource for model information model_visualization.R, the core code for presenting fitted models and relationships standalone_dsp.R, descriptive statistics standalone_bayes.R, to produce tables for the paper lib-00-utils.R, some utility functions datasets_for_modeling.RData, the core dataset used for this analysis Stan.tar.gz, a directory of STAN model output; on our supercomputing node these took multiple days to run and converge Figures.tar.gz, a directory of figures from the paper Raw_Data_Parsers.tar.gz, a directory of both the raw data and the parsers used to obtain the raw data. The dir contains a HowTo file if you would like to reproduce the scraping/cloning part of the project, however note that the original analysis included an rsync copy of the Debian bug database; if you conduct an analysis from scratch, the data you obtain will have changed since our rsync. Appendix.tar.gz, containing figures and data associated with our appendix using an alternate measure of importance (\"vote\" which represents recent usage but omits packages where usage does not update atime; the paper used \"inst\") appendix_with_vote.R, the code appendix_figures, a directory of figures similar to those in the paper but produced for the appendix vote_all_packages_full_results.csv -- summary data on all packages vote_all_packages_full_results.csv.DESCRIPTION A description of the fields in the inst_all_packages_full_results.csv file. For more information, please contact: Kaylea Champion (she/her) kaylea@uw.edu | khascall@gmail.com @kayleachampion Abstract: The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call `underproduction' which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs. We draw on this application to present two experiments: (1) a demonstration of how our technique can be used to identify at-risk software packages in a large FLOSS repository and (2) a validation of these results using an alternate indicator of package risk. Our analysis demonstrates both the utility of our approach and reveals the existence of widespread underproduction in a range of widely-installed software components in Debian.
创建时间:
2023-11-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作