The Sampling Problem when Mining Inter-Library Usage Patterns

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/10963618

下载链接

链接失效反馈

官方服务：

资源简介：

Tool support in software engineering often depends on relationships, regularities, patterns, or rules, mined from sampled code.cExamples are approaches to bug prediction, code recommendation, and code autocompletion. Samples are relevant to scale the analysis of data. Many such samples consist of software projects taken from GitHub; however, the specifics of sampling might influence the generalization of the patterns. In this paper, we focus on how to sample software projects that are clients of libraries and frameworks, when mining for interlibrary usage patterns. We notice that when limiting the sample to a very specific library, inter-library patterns in the form of implications from one library to another may not generalize well. Using a simulation and a real case study, we analyze different sampling methods. Most importantly, our simulation shows that only when sampling for the disjunction of both libraries involved in the implication, the implication generalizes well. Second, we show that real empirical data sampled from GitHub does not behave as we would expect it from our simulation. This identifies a potential problem with the usage of such API for studying inter-library usage patterns.

创建时间：

2024-06-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集