five

开源组件中受公开漏洞影响的函数数据集

收藏
国家基础学科公共科学数据中心2024-03-05 收录
下载链接:
https://www.nbsdc.cn/general/dataDetail?id=64ef838ebb16e0591d024a5f&type=1
下载链接
链接失效反馈
官方服务:
资源简介:
程序开发人员会经常调用第三方库中的功能模块来帮助自身程序项目的开发,然却忽略了第三方库中可能存在的漏洞,使得程序处于不安全的环境中。尽管第三方库会随着发现漏洞而不断进行补丁修复,但由于程序在从第三方库中调用后,在程序项目中并不会实时将其更新到最新版本,因此程序项目依然是可能存在漏洞的。现有工作实现了对第三方库的识别以及是否使用第三方漏洞库的检测,但其对第三方库中是否具有漏洞的检测是粗粒度的,以至于无法判断程序项目是否被第三方库中的漏洞直接污染,即是否真正调用了第三方库中含有漏洞的代码模块。 为此,我们通过从官方权威的公开漏洞数据库以及相关的公开数据平台上如Snyk、Maven等获取包含CVE、CWE漏洞的第三方组件,通过获取并比对分析其补丁版本信息找到和漏洞相关的函数,最终通过函数调用图,进一步确定改第三方组件中可能引发该漏洞的其他API函数,形成第三方库和其中可能引发漏洞的函数列表的映射关系,构建漏洞开源组件中受公开漏洞影响的漏洞函数数据库。该数据库中以json格式存储了包含278个开源组件的7446个漏洞版本,涉及到383个CVE漏洞,总共14.8G的数据。 本数据集的构建为后续开发人员选择第三方库及相关程序接口模块提供指导,使用更少的资源让开发人员知晓使用的第三方库的程序接口模块有无漏洞情况;避免在开发过程中选择有安全隐患的程序接口模块,有效地提高了开发效率,减少了后续的测试成本。

Software developers often invoke functional modules from third-party libraries to facilitate their project development, yet they often overlook potential vulnerabilities in these libraries, leaving their programs in an insecure environment. Although third-party libraries continuously release patch updates as vulnerabilities are discovered, projects rarely update the invoked libraries to their latest versions in real time. As a result, these projects still remain vulnerable. Existing works have achieved the identification of third-party libraries and the detection of whether vulnerable third-party libraries are used, but their vulnerability detection for third-party libraries is coarse-grained, failing to determine whether a project is directly contaminated by vulnerabilities in third-party libraries — specifically, whether it truly invokes vulnerable code modules from these libraries. To address this issue, we collect third-party components with CVE and CWE vulnerabilities from authoritative official public vulnerability databases and relevant public data platforms such as Snyk and Maven. By acquiring and comparing their patch version information, we identify functions related to specific vulnerabilities. Further, through function call graphs, we determine other API functions in the third-party components that may trigger these vulnerabilities, establishing a mapping between third-party libraries and their lists of vulnerability-inducing functions, and thus constructing a vulnerable function database for open-source components affected by public vulnerabilities. This database stores 7446 vulnerable versions across 278 open-source components, involving 383 CVE vulnerabilities, with a total data size of 14.8 GB, and is saved in JSON format. The construction of this dataset provides guidance for subsequent developers when selecting third-party libraries and their associated program interface modules. It enables developers to quickly identify whether the third-party library interface modules they use contain vulnerabilities with fewer resources, avoids selecting interface modules with security risks during development, effectively improves development efficiency, and reduces subsequent testing costs.
提供机构:
南开大学
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个包含278个开源组件、7446个漏洞版本和383个CVE漏洞的数据库,专注于识别和映射开源组件中受公开漏洞影响的函数。数据集以json格式存储,总数据量为14.8GB,旨在为开发人员提供第三方库漏洞函数的参考,以提高开发效率和安全性。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务