PromptAnalysis

Name: PromptAnalysis
Creator: Mendeley Data
Published: 2026-04-28 16:48:37
License: 暂无描述

DataCite Commons2026-04-28 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/9vdbnhc84j

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset Descriptions 1. Embedded_System_Security_-_SmallLLM.csv Shape: 1,213 rows × 7 columns This dataset captures security vulnerability analyses of embedded system GitHub repositories, evaluated by two small-scale large language models — Gemma and Phi-3 (P 3). Each row represents one repository, identified by a keyword category (e.g., "Smart Irrigation System") and a GitHub URL. The two LLM columns contain structured JSON outputs detailing detected vulnerabilities including vulnerability ID, CWE type, OWASP category, CVSS score, affected file/line, code snippet, exploit scenario, and fix recommendations. Additional columns record vulnerabilities the models missed ("Misses By LLM") and a cumulative Total vulnerability summary. This dataset is suited for benchmarking small LLMs on security-focused code review tasks. 2. Embedded_System_Security_-_LargeLLM.csv Shape: 1,664 rows × 7 columns Structurally similar to the SmallLLM dataset, this file evaluates the same type of embedded system repositories but using two large-scale LLMs — GPT (likely GPT-4) and Gemini. Columns follow the same pattern: keyword, repository name, GitHub URL, structured JSON vulnerability reports per model, missed vulnerabilities, and totals. With more rows than the SmallLLM dataset, it likely covers a broader set of repositories. Together with the SmallLLM file, this dataset enables direct performance comparison between small and large LLMs on embedded security vulnerability detection. 3. EVDD_-_Static_Analysis.csv Shape: 1,026 rows × 8 columns This dataset is a C/C++ vulnerability code corpus built around real-world CVEs (Common Vulnerabilities and Exposures). Each row corresponds to a specific CVE entry and includes the year, vulnerability type (e.g., Overflow, DoS, Privilege Escalation), CVE ID, bad (vulnerable) code, good (patched) code, a combined code snippet showing both versions side-by-side, and the outputs of two static analysis tools — Cppcheck and Flawfinder — run against those snippets. This dataset is designed to evaluate and benchmark static analysis tools and potentially train/test models on vulnerability detection, patch generation, or tool effectiveness analysis using real CVE-grounded examples.

提供机构：

Mendeley Data

创建时间：

2026-04-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集