Copy-based Reuse in Open Source Software
收藏arXiv2023-12-15 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2312.09370v1
下载链接
链接失效反馈官方服务:
资源简介:
Copy-based Reuse in Open Source Software数据集由田纳西大学诺克斯维尔分校创建,旨在通过提供复制活动数据来促进对开源软件中基于复制的全局重用的研究。该数据集捕捉了几乎所有开源软件的整个文件复制活动,通过开发一种利用World of Code基础设施的高效算法来检测基于复制的重用。数据集内容包括约160亿个版本的源代码,创建过程涉及跟踪所有源代码版本并识别复制实例。该数据集的应用领域包括支持基于复制的重用研究,开发相关工具,以及最小化与重用相关的风险,如安全漏洞、许可和版权问题及代码质量问题。
The Copy-based Reuse in Open Source Software (OSS) dataset was created by the University of Tennessee, Knoxville. It is designed to advance research on global copy-based reuse in open source software by providing replication activity data. This dataset captures full file-level replication activities across nearly all open source software, achieved via an efficient algorithm that leverages the World of Code infrastructure to detect copy-based reuse. The dataset contains approximately 16 billion versions of source code, and its creation involves tracking all source code versions and identifying replication instances. Applications of this dataset include supporting research on copy-based reuse, developing relevant tools, and mitigating risks associated with reuse, such as security vulnerabilities, licensing and copyright issues, and code quality problems.
提供机构:
田纳西大学诺克斯维尔分校
创建时间:
2023-12-15



