Enhancing Pull Request Reviews: Leveraging Large Language Models to Detect Inconsistencies Between Issue Descriptions and Code Changes Replication Package

Name: Enhancing Pull Request Reviews: Leveraging Large Language Models to Detect Inconsistencies Between Issue Descriptions and Code Changes Replication Package
Creator: figshare
Published: 2025-10-06 09:49:54
License: 暂无描述

DataCite Commons2025-10-06 更新2026-04-25 收录

下载链接：

https://figshare.com/articles/dataset/Enhancing_Pull_Request_Reviews_Leveraging_Large_Language_Models_to_Detect_Inconsistencies_Between_Issue_Descriptions_and_Code_Changes_Replication_Package/27896433/1

下载链接

链接失效反馈

官方服务：

资源简介：

Replication PackageThis replication package contains all the necessary files, data, and scripts to replicate the experiments and results presented in the study. Below is a detailed description of the folder structure and contents of the replication package.General Information3 Different prompts with different inputs have been used to evaluate each model.First prompt utilizes issue and PR texts.Second prompt utilizes issue text and PR diff.Third prompt utilizes issue text, PR text, and PR diff.Each model has been ran 4 times with each prompt.Folder Structure├── analysis │ ├── confusion_matrix │ ├── performance_metrics├── model_outputs├── notebooks_and_manually_labeled_data<code>analysis</code>This folder contains outputs and metrics generated during the analysis of model performance.<code>analysis/confusion_matrix</code>This subfolder contains confusion matrix visualizations for different models and prompts, providing a detailed comparison between predicted and true labels. These were generated using the <code>analysis.ipynb</code> notebook.Files<code>gpt4o-mini_prompt_prompt1_confusion-matrix.png</code>: Confusion matrix for GPT-4o-mini with Prompt 1.<code>gpt4o-mini_prompt_prompt2_confusion-matrix.png</code>: Confusion matrix for GPT-4o-mini with Prompt 2.<code>gpt4o-mini_prompt_prompt3_confusion-matrix.png</code>: Confusion matrix for GPT-4o-mini with Prompt 3.<code>gpt4o_prompt_prompt1_confusion-matrix.png</code>: Confusion matrix for GPT-4o with Prompt 1.<code>gpt4o_prompt_prompt2_confusion-matrix.png</code>: Confusion matrix for GPT-4o with Prompt 2.<code>gpt4o_prompt_prompt3_confusion-matrix.png</code>: Confusion matrix for GPT-4o with Prompt 3.<code>Meta-Llama-405B_prompt_prompt1_confusion-matrix.png</code>: Confusion matrix for Meta-Llama-405B with Prompt 1.<code>Meta-Llama-405B_prompt_prompt2_confusion-matrix.png</code>: Confusion matrix for Meta-Llama-405B with Prompt 2.<code>Meta-Llama-405B_prompt_prompt3_confusion-matrix.png</code>: Confusion matrix for Meta-Llama-405B with Prompt 3.<code>Meta-Llama-70B_prompt_prompt1_confusion-matrix.png</code>: Confusion matrix for Meta-Llama-70B with Prompt 1.<code>Meta-Llama-70B_prompt_prompt2_confusion-matrix.png</code>: Confusion matrix for Meta-Llama-70B with Prompt 2.<code>Meta-Llama-70B_prompt_prompt3_confusion-matrix.png</code>: Confusion matrix for Meta-Llama-70B with Prompt 3.<code>analysis/performance_metrics</code>This subfolder contains JSON files with performance metrics for different models and prompts, detailing evaluation results such as accuracy, F1 scores, precision, recall, and specificity. These metrics complement the confusion matrices.Files<code>gpt4o-mini-responses-with-predictions-metrics.json</code>: Metrics for GPT-4o-mini across different prompts.<code>gpt4o-responses-with-predictions-metrics.json</code>: Metrics for GPT-4o across different prompts.<code>Meta-Llama-405B-responses-with-predictions-metrics.json</code>: Metrics for Meta-Llama-405B across different prompts.<code>Meta-Llama-70B-responses-with-predictions-metrics.json</code>: Metrics for Meta-Llama-70B across different prompts.Each JSON file includes:Accuracy: Overall correctness of predictions.F1 Scores (weighted, micro, macro): Evaluation of precision and recall balance.Precision and Recall: Measurement of positive case predictions.Specificity: False positive avoidance rate for each label.Consistency: Consistency of predictions between iterations.<code>model_outputs</code>This folder contains the raw outputs generated by each model for all iterations and prompts, as well as the final predictions obtained by selecting the most common label across iterations for each prompt/index.Files<code>gpt4o-mini-responses-with-predictions.json</code>: Raw outputs from GPT-4o-mini for all iterations and prompts, along with final predictions.<code>gpt4o-responses-with-predictions.json</code>: Raw outputs from GPT-4o for all iterations and prompts, along with final predictions.<code>Meta-Llama-70B-responses-with-predictions.json</code>: Raw outputs from Meta-Llama-70B for all iterations and prompts, along with final predictions.<code>Meta-Llama-405B-responses-with-predictions.json</code>: Raw outputs from Meta-Llama-405B for all iterations and prompts, along with final predictions.Each file includes:Iteration Outputs: The model responses for all iterations and prompts, capturing the diversity and variability in predictions.Final Predictions: Derived by aggregating the most common label across iterations for a specific prompt/index.<code>notebooks_and_manually_labeled_data</code>This folder contains Jupyter notebooks used for model interaction and analysis, manually labeled data, and pre-fetched data from GitHub for convenience.Files<code>Transformer_Labels.xlsx</code>Contains the manually labeled data for the study, including issue and PR pairs along with their respective labels.Serves as the ground truth for evaluating model performance.<code>prefetched_data.pkl</code>A pre-fetched dataset containing GitHub pull requests (PRs) and associated issue data.Provides a convenient starting point for analyses without needing to fetch data from GitHub repeatedly.<code>Azure_ML_lab.ipynb</code>Jupyter notebook used to interact with models on Azure Machine Learning (ML) service.Includes code for prompting the models, retrieving responses, and storing outputs for further analysis.<code>analyze.ipynb</code>Jupyter notebook designed for analyzing the outputs generated by the models.Contains code for extracting insights, generating confusion matrices, and calculating performance metrics.<code>Data_Scraper_Notebook.ipynb</code>This notebook contains the code used for scraping data from various sources, including setup and execution steps.

提供机构：

figshare

创建时间：

2025-10-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集