Supporting Evidence-Based Software Package Selection in Python: Data, Scripts, and Evaluation

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/n99pxfpf4x

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset accompanies the study on data-driven software package selection in the Python ecosystem and supports the design, development, and evaluation of the PySelect system. It includes all relevant materials generated and used throughout the research process. The repository provides the full results of our large-scale analysis of nearly 800K Python scripts, including frequency distributions, package availability metrics, and domain keyword summaries. It also contains the complete set of data collection and analysis scripts used to extract, normalize, and structure the raw repository data. These scripts were used to process metadata, parse import statements, and generate the knowledge base used in the PySelect system. In addition to the empirical data, we include the outputs from the experimental evaluation of the pipelines, which assess the correctness and performance of the extraction procedures. The repository also contains the indexed knowledge base files generated from normalized usage and domain features, which form the foundation of the recommendation functionality. To support transparency in the conceptual foundations of the system, we provide the literature study compiled during the research, which outlines key themes and related work in software reuse, package recommendation, and empirical software engineering. Finally, all materials from the user study are included, such as survey instruments, anonymized participant responses, and analysis results based on the Technology Acceptance Model (TAM). This archive is intended to support replication, critical review, and further research on empirical approaches to developer tooling and software dependency management. All content is provided for reuse under the accompanying license.

创建时间：

2025-07-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集