Enhancing Pull Request Reviews: Leveraging Large Language Models to Detect Inconsistencies Between Issue Descriptions and Code Changes Replication Package

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Enhancing_Pull_Request_Reviews_Leveraging_Large_Language_Models_to_Detect_Inconsistencies_Between_Issue_Descriptions_and_Code_Changes_Replication_Package/27896433

下载链接

链接失效反馈

官方服务：

资源简介：

Replication Package This replication package contains all the necessary files, data, and scripts to replicate the experiments and results presented in the study. Below is a detailed description of the folder structure and contents of the replication package. General Information 3 Different prompts with different inputs have been used to evaluate each model. First prompt utilizes issue and PR texts.Second prompt utilizes issue text and PR diff.Third prompt utilizes issue text, PR text, and PR diff.Each model has been ran 4 times with each prompt. Folder Structure├── analysis │ ├── confusion_matrix │ ├── performance_metrics ├── model_outputs ├── notebooks_and_manually_labeled_data analysisThis folder contains outputs and metrics generated during the analysis of model performance. analysis/confusion_matrixThis subfolder contains confusion matrix visualizations for different models and prompts, providing a detailed comparison between predicted and true labels. These were generated using the analysis.ipynb notebook. Filesgpt4o-mini_prompt_prompt1_confusion-matrix.png: Confusion matrix for GPT-4o-mini with Prompt 1.gpt4o-mini_prompt_prompt2_confusion-matrix.png: Confusion matrix for GPT-4o-mini with Prompt 2.gpt4o-mini_prompt_prompt3_confusion-matrix.png: Confusion matrix for GPT-4o-mini with Prompt 3.gpt4o_prompt_prompt1_confusion-matrix.png: Confusion matrix for GPT-4o with Prompt 1.gpt4o_prompt_prompt2_confusion-matrix.png: Confusion matrix for GPT-4o with Prompt 2.gpt4o_prompt_prompt3_confusion-matrix.png: Confusion matrix for GPT-4o with Prompt 3.Meta-Llama-405B_prompt_prompt1_confusion-matrix.png: Confusion matrix for Meta-Llama-405B with Prompt 1.Meta-Llama-405B_prompt_prompt2_confusion-matrix.png: Confusion matrix for Meta-Llama-405B with Prompt 2.Meta-Llama-405B_prompt_prompt3_confusion-matrix.png: Confusion matrix for Meta-Llama-405B with Prompt 3.Meta-Llama-70B_prompt_prompt1_confusion-matrix.png: Confusion matrix for Meta-Llama-70B with Prompt 1.Meta-Llama-70B_prompt_prompt2_confusion-matrix.png: Confusion matrix for Meta-Llama-70B with Prompt 2.Meta-Llama-70B_prompt_prompt3_confusion-matrix.png: Confusion matrix for Meta-Llama-70B with Prompt 3.analysis/performance_metricsThis subfolder contains JSON files with performance metrics for different models and prompts, detailing evaluation results such as accuracy, F1 scores, precision, recall, and specificity. These metrics complement the confusion matrices. Filesgpt4o-mini-responses-with-predictions-metrics.json: Metrics for GPT-4o-mini across different prompts.gpt4o-responses-with-predictions-metrics.json: Metrics for GPT-4o across different prompts.Meta-Llama-405B-responses-with-predictions-metrics.json: Metrics for Meta-Llama-405B across different prompts.Meta-Llama-70B-responses-with-predictions-metrics.json: Metrics for Meta-Llama-70B across different prompts.Each JSON file includes: Accuracy: Overall correctness of predictions.F1 Scores (weighted, micro, macro): Evaluation of precision and recall balance.Precision and Recall: Measurement of positive case predictions.Specificity: False positive avoidance rate for each label.Consistency: Consistency of predictions between iterations.model_outputsThis folder contains the raw outputs generated by each model for all iterations and prompts, as well as the final predictions obtained by selecting the most common label across iterations for each prompt/index. Filesgpt4o-mini-responses-with-predictions.json: Raw outputs from GPT-4o-mini for all iterations and prompts, along with final predictions.gpt4o-responses-with-predictions.json: Raw outputs from GPT-4o for all iterations and prompts, along with final predictions.Meta-Llama-70B-responses-with-predictions.json: Raw outputs from Meta-Llama-70B for all iterations and prompts, along with final predictions.Meta-Llama-405B-responses-with-predictions.json: Raw outputs from Meta-Llama-405B for all iterations and prompts, along with final predictions.Each file includes: Iteration Outputs: The model responses for all iterations and prompts, capturing the diversity and variability in predictions.Final Predictions: Derived by aggregating the most common label across iterations for a specific prompt/index.notebooks_and_manually_labeled_dataThis folder contains Jupyter notebooks used for model interaction and analysis, manually labeled data, and pre-fetched data from GitHub for convenience. FilesTransformer_Labels.xlsx Contains the manually labeled data for the study, including issue and PR pairs along with their respective labels. Serves as the ground truth for evaluating model performance. prefetched_data.pkl A pre-fetched dataset containing GitHub pull requests (PRs) and associated issue data. Provides a convenient starting point for analyses without needing to fetch data from GitHub repeatedly. Azure_ML_lab.ipynb Jupyter notebook used to interact with models on Azure Machine Learning (ML) service. Includes code for prompting the models, retrieving responses, and storing outputs for further analysis. analyze.ipynb Jupyter notebook designed for analyzing the outputs generated by the models. Contains code for extracting insights, generating confusion matrices, and calculating performance metrics. Data_Scraper_Notebook.ipynb This notebook contains the code used for scraping data from various sources, including setup and execution steps.

创建时间：

2025-10-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集