LKRepair: Automated Patch Generation for Linux Kernel Vulnerabilities via Syntax-Aware LLMs
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.13338271
下载链接
链接失效反馈官方服务:
资源简介:
LKRepair is an LLM-based framework for automatically generating defect-fixing patches, focusing on Linux kernel defect remediation. The framework includes the LKRD dataset, the long-prompt optimization method CoLPO, as well as subsystems such as source-code-level application of Linux kernel patches, multi-node automatic compilation, and multi-node automated verification of fixes. We host the source code and dataset on two open platforms: GitHub and Zenodo. LKRD is the domain-specific dataset within the framework, designed to provide real-world bug-fixing context data for the framework, while also serving as a training dataset for fine-tuning LLMs in the field of Linux kernel bug fixing. This dataset covers defects and fix patches released by the kernel community between 2017 and 2025, comprising a total of 333 kernel sub-versions and 324 major commits, spanning 97 core submodules and 4 cross-subsystem modules. After filtering the originally collected data, we obtained a total of 9,286 complete samples, including 3,112 unique bugs, 2,669 patches, 6,560 source code blocks before and after fixes, and 7,275 patch code blocks. Each sample covers 15 key features. The dataset is stored in a MongoDB database and has a size of approximately 5.32 GB (1.25 GB when compressed). Whether it is LKRD or its generated subsets such as LKRD-C, each sample includes 15 feature fields, specifically: Commit ID, Crash Log, Kconfig, POC-c, POC-syz, Commit ID-fix, CommitID-parent, Patch Code, Email List, Sub-system, Kernel Version, Source Code Chunk-bug, Source Code Chunk-fix, Patch Code Chunk, and Bug Source File Name, as well as some fields used for development tracking.
DataSet
Release address: https://doi.org/10.5281/zenodo.13338271
After extracting B-P-Pair.zip, rename the folder to ``patch_pair`` and copy it to the ``pp_pair/`` directory in the source code.
The data in lkrd.zip and lkrd-C.zip is in MongoDB format and must be imported into a database (named: Scrapy_DB2025). The former is the raw dataset for Linux bug fixes; after filtering, LKRD contains a total of 9,286 complete samples, including 3,112 unique bugs and 2,669 patches. The latter is the result of initial processing on LKRD, including B-P construction, code block segmentation, and semantic annotation, yielding 8,827 records.
It is worth noting that during the execution of LKRepair, both datasets must be imported into the database to ensure the program runs properly.
Source Code
Release address: https://github.com/HNUSystemsLab/LKRepair
We recommend naming the project ``LKRepair2025`` and setting up the development environment using Anaconda. First, carefully read the contents of `readme.md` in the root directory. Next, import `lkr1.yaml` from the root directory to initialize the library environment. Finally, install the database and import the data from `LKRD` and `LKRD-C`.
The ``pp_pair`` directory in the root directory contains the unzipped B-P-Pair data. The ``LLM`` folder contains prompt templates and source code for local and remote API interfaces related to the LLM. The ``plugins`` directory is for plugin interfaces; for example, the CoLPO plugin for LKRepair. Developers can start exploring third-party plugins from this directory. The `config.py` file in the `public` directory configures the project’s basic information, directory structure, LLM API keys, and other data.
The entire project uses a simple command-line system for startup, and each file includes comments.
提供机构:
Zenodo
创建时间:
2026-03-26



