five

mjbommar/linux-security-meanfield

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mjbommar/linux-security-meanfield
下载链接
链接失效反馈
官方服务:
资源简介:
Linux安全平均场语料库是一个包含22个Linux基础系统仓库中与安全相关的提交记录的语料库,统一在一个单一的模式下,既包含CVE档案修复提交,也包含非CVE安全信号提交。这是平均场调查论文的第一阶段发布成果。数据集分为三个部分:cve_dossiered(2,254行,每行代表一个来自范围审核的CVE档案语料库的修复提交,CVE对)、non_cve_signal(21,609行,无CVE附件的提交锚定安全信号行,来自内核历史扫描和21个非内核仓库历史扫描)和hard_negatives(1,138行,来自档案超出范围存档的CPE过度匹配CVE,作为数据质量工件包含在内)。数据集结构包括提交元数据、信号字段、CVE链接等,并提供了补丁文件和快照信息。

The Linux Security Meanfield Corpus is a commit-keyed corpus of security-relevant commits across 22 Linux base-system repositories, unified on a single schema that carries both CVE-dossiered fixes and non-CVE security-signal commits. This is the Phase-1 release artifact for the mean-field survey paper. The dataset is divided into three parts: cve_dossiered (2,254 rows, one row per (fix_commit, CVE) pair from the scope-audited CVE dossier corpus), non_cve_signal (21,609 rows, commit-anchored security signal rows with no CVE attached, sourced from the kernel history sweep and the 21 non-kernel repo history sweeps), and hard_negatives (1,138 rows, CPE-overmatch CVEs from the dossier out-of-scope archive, included as a data-quality artifact). The dataset schema includes commit metadata, signal fields, CVE linkage, etc., and provides patch files and snapshot information.
提供机构:
mjbommar
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作