LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM

Name: LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM
Creator: figshare
Published: 2026-04-27 13:28:58
License: 暂无描述

DataCite Commons2026-04-27 更新2026-02-09 收录

下载链接：

https://figshare.com/articles/dataset/LAURA_Enhancing_Code_Review_Generation_with_Context-Enriched_Retrieval-Augmented_LLM/27367194

下载链接

链接失效反馈

官方服务：

资源简介：

LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLMIntroductionLAURA is an LLM-based retrieval-augmented, context-aware framework for code review generation, which integrates context augmentation, review exemplar retrieval, and prompt tuning to enhance the performance of LLMs (in our study, ChatGPT-4o and DeepSeek v3) in generating code review comments.The experiments show that LAURA outperforms the direct application of ChatGPT-4o and DeepSeek v3 for code review generation and significantly surpasses the performance of the pre-trained model CodeReviewer.Since our experiments are based on ChatGPT-4o and DeepSeek v3, we have released the data processing code and dataset used in our research. The code section includes the Python scripts we used for data collection, cleaning, merging, and retrieval. The dataset section contains 301k entries from 1,807 high-quality projects sourced from GitHub, covering four programming languages: C, C++, Java, and Python. We also provide the time-split dataset used as the retrieval database (which is also used for fine-tuning CodeReviewer) and the human-annotated evaluation dataset.File Structurecodes: Data collection, filtering and post-processing codes used in our study data_collection_and_filtering.py: Code for collecting data via the GitHub GraphQL API and filtering with rule-based and LLM-based methods data_embedding.py: Code for data embedding data_merging.py: Code for data merging, used to merge the review comments with the same target diff data_retrieval.py: Code for data retrieval diff_extension.py: Code for extending the code diffs by integrating the full code contexts into the diffsdatasets: Datasets built and used in our study database_for_retrieve.csv: The dataset we built for retrieval-augmented generation, containing 298,494 entries prior to December 26, 2024 evaluation_data.csv: The evaluation dataset we manually annotated, containing 384 entries later than December 26, 2024 full_dataset.csv: The full dataset we collected, containing 301,256 entries basic_eval_384_res.xlsx: The original evaluation results for the 384 test instances, where only a small number of longer comments may be partially truncated due to Excel’s 8192-character limit while all other data remains intactprompts: The prompts used in data filtering, generation and evaluation direct_generation.txt: The prompt we used for direct generation as baselines LAURA_generation.txt: The prompt we used for LAURA generation LLM_evaluation.txt: The prompt we used for LLM evaluation LLM_filtering.txt: The prompt we used for LLM filtering in data filtering processREADME.md: Description of our submission

提供机构：

figshare

创建时间：

2024-10-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集