Replication Package for the Paper: "An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors".

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13771088

下载链接

链接失效反馈

官方服务：

资源简介：

This is the replication package for the paper: "An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors". The replication package is organized into three folders: 1. RQ1 Performance of LLMs - Five prompt templates.pdfThis PDF demonstrates the detailed structures of the five prompt templates designed in Section 3.3.2 of our paper. - source code of the Python and C/C++ datasetsThis folder contains the source code of the Python and C/C++ datasets, used to construct prompts and apply the baseline tools for static analysis. - prompts for the Python and C/C++ datasetsThis folder contains the prompts constructed from the source code of the Python and C/C++ datasets based on the five prompt templates. - responses of LLMs and baselinesThis folder contains the responses generated by LLMs for each prompt and the analysis results of baseline tools. For CodeQL, you need to upload results.sarif to GitHub (https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/uploading-a-sarif-file-to-github) to view the analysis results. For SonarQube, you need to import the export file into an Enterprise Edition or higher instance of the same version (v10.5 in our work) and similar configuration (default configuration in our work) to view the analysis results. - entropy_calculation.pyThis Python script calculates the average entropy of each llm-prompt combination to measure the consistency of LLM responses in three repetitive experiments. - Data Labelling for the C/C++ Dataset.xlsx- Data Labelling for the Python Dataset.xlsxThe two Microsoft (MS) files contain the labeling results for LLMs and baselines in the C/C++ and Python datasets, including the category of each response generated by LLM for each prompt, as well as the category of each analysis result generated by baseline for each code file. The four categories(i.e., Instrumental, Helpful, Misleading and Uncertain) are defined in Section 3.3.3 of our paper as the labelling criteria. How to Read the MS Excel files:Both MS Excel files contain 5 sheets. The first sheet ('all_c++_data' or 'all_python_data') includes the information of all data in each dataset. The sheets 'first round', 'second round' and 'third round' represent the labelling results for LLMs under five prompts in three repetitive experiments. The sheet 'Baselines' include the labelling results for baseline tools. Column Description File ID the identifier of each code file in our dataset. Security Defect the security defect(s) that the code file contains. Project the source project of the code file. Suffix the suffix of the code file. 2. RQ2 Quality Problem in Responses- data_analysis_first_round.mx22- data_analysis_second_round.mx22- data_analysis_third_round.mx22 These three MAXQDA project files contain the results of data extraction for quality problems present in responses generated by the best-performing LLM-prompt combination across three repetitive experiments. This file can be opened by MAXQDA 2022 or higher versions, which are available at https://www.maxqda.com/ for download. You may also use the free 14 days trial version of MAXQDA 2024, which is available at https://www.maxqda.com/trial for download. 3. RQ3 Factor influencing LLMsThis folder contains two sub-folders: - Step 1 - correlation analysisFiles in this subfolder are for conducting correlation analysis for explanatory variables through a Python script. - Step 2 - redundancy analysis and model fittingFiles in this subfolder are for conducting redundancy analysis, allocation of degree of freedoms, model fitting and evaluation through an R script. Detailed instructions for running the R script can be found in readme.md in this subfolder.

创建时间：

2024-09-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集