PatchEval
收藏魔搭社区2026-01-06 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ByteDance/PatchEval
下载链接
链接失效反馈官方服务:
资源简介:
## 👋 Overview
PatchEval is a benchmark designed to systematically evaluate LLMs and Agents in the task of automated vulnerability repair.
It includes 1,000 vulnerabilities sourced from CVEs reported between 2015 and 2025, covering 65 CWE categories across Go, JavaScript, and Python.
A subset of 230 CVEs is paired with Dockerized sandbox environments that enable runtime patch validation through Proof-of-Concept (PoC) and unit testing.
## 📜 Data Instances Structure
Each vulnerability in the PatchEval dataset is a JSON object with the following structure:
```
cve_id: (str) - The unique CVE identifier from NVD (e.g., CVE-2024-42005).
cve_description: (str) - The official description of the CVE from NVD.
cwe_info: (dict) - A dictionary containing details about the associated Common Weakness Enumeration (CWE).
repo: (str) - The URL of the GitHub repository.
patch_url: (list[str]) - A list of URLs on GitHub.
programing_language: (str) - The primary programming language of the vulnerable code.
vul_func: (list[dict]) - A list of vulnerable code snippet.
fix_func: (list[dict]) - A list of fixed code snippet.
vul_patch: (str) - The patch diff of the CVE.
poc_test_cmd: (str) - The command to execute the Proof-of-Concept (PoC) test within the provided Docker environment. A null value indicates that no PoC environment is available.
unit_test_cmd: (str) - The command to execute the unit test within the provided Docker environment. A null value indicates that no unit test is available.
```
## 📖 Citation
If you find PatchEval useful for your research and applications, feel free to give us a star ⭐ or cite us using:
```bibtex
@misc{wei2025patcheval,
title={PATCHEVAL: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities},
author={Zichao Wei and Jun Zeng and Ming Wen and Zeliang Yu and Kai Cheng and Yiding Zhu and Jingyi Guo and Shiqi Zhou and Le Yin and Xiaodong Su and Zhechao Ma},
year={2025},
eprint={2511.11019},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2511.11019},
}
```
## ✍️ License
This project is licensed under the Apache License 2.0.
👋 概述
PatchEval是一款专为系统性评估大语言模型(LLM)与AI智能体(AI Agent)在自动化漏洞修复任务中表现而设计的基准测试集。
该数据集包含1000个漏洞样本,均取自2015年至2025年间披露的公共漏洞与暴露(Common Vulnerabilities and Exposures, CVE)记录,覆盖Go、JavaScript及Python三种编程语言下的65个通用弱点枚举(Common Weakness Enumeration, CWE)类别。
其中230个CVE样本配套了Docker化沙箱环境,可通过概念验证(Proof-of-Concept, PoC)与单元测试实现运行时补丁验证。
📜 数据样本结构
PatchEval数据集中的每个漏洞样本均为符合下述结构的JSON对象:
cve_id: (str) - 来自美国国家漏洞数据库(National Vulnerability Database, NVD)的唯一CVE标识符(示例:CVE-2024-42005)。
cve_description: (str) - 美国国家漏洞数据库发布的CVE官方描述文本。
cwe_info: (dict) - 包含关联通用弱点枚举详细信息的字典。
repo: (str) - 对应GitHub仓库的URL。
patch_url: (list[str]) - GitHub上的补丁相关URL列表。
programing_language: (str) - 存在漏洞代码的主要编程语言。
vul_func: (list[dict]) - 漏洞代码片段列表。
fix_func: (list[dict]) - 修复后代码片段列表。
vul_patch: (str) - 该CVE的补丁差异文本。
poc_test_cmd: (str) - 在所提供的Docker环境中执行概念验证测试的命令。若值为null,则表示无可用的PoC环境。
unit_test_cmd: (str) - 在所提供的Docker环境中执行单元测试的命令。若值为null,则表示无可用的单元测试环境。
📖 引用方式
若您的研究或应用场景用到了PatchEval,欢迎为本项目点亮Star⭐,或通过以下BibTeX格式引用我们:
bibtex
@misc{wei2025patcheval,
title={PATCHEVAL: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities},
author={Zichao Wei and Jun Zeng and Ming Wen and Zeliang Yu and Kai Cheng and Yiding Zhu and Jingyi Guo and Shiqi Zhou and Le Yin and Xiaodong Su and Zhechao Ma},
year={2025},
eprint={2511.11019},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2511.11019},
}
✍️ 授权协议
本项目采用Apache License 2.0开源协议授权。
提供机构:
maas
创建时间:
2025-11-18



