LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM
收藏DataCite Commons2026-04-27 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/LAURA_Enhancing_Code_Review_Generation_with_Context-Enriched_Retrieval-Augmented_LLM/27367194
下载链接
链接失效反馈官方服务:
资源简介:
<b>LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM</b><b>Introduction</b>LAURA is an LLM-based retrieval-augmented, context-aware framework for code review generation, which integrates context augmentation, review exemplar retrieval, and prompt tuning to enhance the performance of LLMs (in our study, ChatGPT-4o and DeepSeek v3) in generating code review comments.The experiments show that LAURA outperforms the direct application of ChatGPT-4o and DeepSeek v3 for code review generation and significantly surpasses the performance of the pre-trained model CodeReviewer.Since our experiments are based on ChatGPT-4o and DeepSeek v3, we have released the data processing code and dataset used in our research. The code section includes the Python scripts we used for data collection, cleaning, merging, and retrieval. The dataset section contains 301k entries from 1,807 high-quality projects sourced from GitHub, covering four programming languages: C, C++, Java, and Python. We also provide the time-split dataset used as the retrieval database (which is also used for fine-tuning CodeReviewer) and the human-annotated evaluation dataset.<b>File Structure</b><b>codes</b>: Data collection, filtering and post-processing codes used in our study<br><b>data_collection_and_filtering.py</b>: Code for collecting data via the GitHub GraphQL API and filtering with rule-based and LLM-based methods<br><b>data_embedding.py</b>: Code for data embedding<br><b>data_merging.py</b>: Code for data merging, used to merge the review comments with the same target diff<br><b>data_retrieval.py</b>: Code for data retrieval<br><b>diff_extension.py</b>: Code for extending the code diffs by integrating the full code contexts into the diffs<b>datasets</b>: Datasets built and used in our study<br><b>database_for_retrieve.csv</b>: The dataset we built for retrieval-augmented generation, containing 298,494 entries prior to December 26, 2024<br><b>evaluation_data.csv</b>: The evaluation dataset we manually annotated, containing 384 entries later than December 26, 2024<br><b>full_dataset.csv</b>: The full dataset we collected, containing 301,256 entries<br><b>basic_eval_384_res.xlsx</b>: The original evaluation results for the 384 test instances, where only a small number of longer comments may be partially truncated due to Excel’s 8192-character limit while all other data remains intact<b>prompts:</b> The prompts used in data filtering, generation and evaluation<br><b>direct_generation.txt</b>: The prompt we used for direct generation as baselines<br><b>LAURA_generation.txt</b>: The prompt we used for LAURA generation<br><b>LLM_evaluation.txt</b>: The prompt we used for LLM evaluation<br><b>LLM_filtering.txt</b>: The prompt we used for LLM filtering in data filtering process<b>README.md</b>: Description of our submission<br>
提供机构:
figshare
创建时间:
2024-10-31



