five

Supporting Evidence-Based Software Package Selection in Python: Data, Scripts, and Evaluation

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/n99pxfpf4x
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset accompanies the study on data-driven software package selection in the Python ecosystem and supports the design, development, and evaluation of the PySelect system. It includes all relevant materials generated and used throughout the research process. The repository provides the full results of our large-scale analysis of nearly 800K Python scripts, including frequency distributions, package availability metrics, and domain keyword summaries. It also contains the complete set of data collection and analysis scripts used to extract, normalize, and structure the raw repository data. These scripts were used to process metadata, parse import statements, and generate the knowledge base used in the PySelect system. In addition to the empirical data, we include the outputs from the experimental evaluation of the pipelines, which assess the correctness and performance of the extraction procedures. The repository also contains the indexed knowledge base files generated from normalized usage and domain features, which form the foundation of the recommendation functionality. To support transparency in the conceptual foundations of the system, we provide the literature study compiled during the research, which outlines key themes and related work in software reuse, package recommendation, and empirical software engineering. Finally, all materials from the user study are included, such as survey instruments, anonymized participant responses, and analysis results based on the Technology Acceptance Model (TAM). This archive is intended to support replication, critical review, and further research on empirical approaches to developer tooling and software dependency management. All content is provided for reuse under the accompanying license.
创建时间:
2025-07-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作