five

Dataset for the Paper: "Challenges of Utilizing Large Language Models for Automated Security Code Review"

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10579427
下载链接
链接失效反馈
官方服务:
资源简介:
This is the dataset for the paper: "Challenges of Utilizing Large Language Models for Automated Security Code Review", corresponding to the three steps described in Section 3.3 of this paper. In the following, we provide a brief description of the folder and the files. 1. Step 1 - Prompt and ResponseThis folder contains two subfolders: 'prompt' and 'response'. The 'prompt' folder contains all prompts constructed based on 549 code files and five prompt templates. The 'response' folder contains all responses generated by three Large Language Models (LLMs), i.e., GPT-3.5, GPT-4, and Gemini Pro, when feeding five types of prompts into the three LLMs. 2. Step 2 - Data Labelling for Calculating Performance.xlsxThis Excel file contains the data labelling results for all responses generated under each LLM-prompt combination. Based on the classification defined in the evaluation method utilized in our study, we have labelled the responses into four types: Instrumental, Helpful, Misleading, and Uncertain, in order to calculate performance scores. 3. Step 3 - Data Extraction for Quality Problem.mx22The MAXQDA project file is the results of data extraction for quality problems present in 82 responses generated by the best-performing LLM-prompt combination. This file can be opened by MAXQDA 2022 or higher versions, which are available at https://www.maxqda.com/ for download. You may also use the free 14 days trial version of MAXQDA 2024, which is available at https://www.maxqda.com/trial for download.
创建时间:
2024-01-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作