PromptAnalysis
收藏DataCite Commons2026-04-28 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/9vdbnhc84j
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Descriptions
1. Embedded_System_Security_-_SmallLLM.csv
Shape: 1,213 rows × 7 columns
This dataset captures security vulnerability analyses of embedded system GitHub repositories, evaluated by two small-scale large language models — Gemma and Phi-3 (P 3). Each row represents one repository, identified by a keyword category (e.g., "Smart Irrigation System") and a GitHub URL. The two LLM columns contain structured JSON outputs detailing detected vulnerabilities including vulnerability ID, CWE type, OWASP category, CVSS score, affected file/line, code snippet, exploit scenario, and fix recommendations. Additional columns record vulnerabilities the models missed ("Misses By LLM") and a cumulative Total vulnerability summary. This dataset is suited for benchmarking small LLMs on security-focused code review tasks.
2. Embedded_System_Security_-_LargeLLM.csv
Shape: 1,664 rows × 7 columns
Structurally similar to the SmallLLM dataset, this file evaluates the same type of embedded system repositories but using two large-scale LLMs — GPT (likely GPT-4) and Gemini. Columns follow the same pattern: keyword, repository name, GitHub URL, structured JSON vulnerability reports per model, missed vulnerabilities, and totals. With more rows than the SmallLLM dataset, it likely covers a broader set of repositories. Together with the SmallLLM file, this dataset enables direct performance comparison between small and large LLMs on embedded security vulnerability detection.
3. EVDD_-_Static_Analysis.csv
Shape: 1,026 rows × 8 columns
This dataset is a C/C++ vulnerability code corpus built around real-world CVEs (Common Vulnerabilities and Exposures). Each row corresponds to a specific CVE entry and includes the year, vulnerability type (e.g., Overflow, DoS, Privilege Escalation), CVE ID, bad (vulnerable) code, good (patched) code, a combined code snippet showing both versions side-by-side, and the outputs of two static analysis tools — Cppcheck and Flawfinder — run against those snippets. This dataset is designed to evaluate and benchmark static analysis tools and potentially train/test models on vulnerability detection, patch generation, or tool effectiveness analysis using real CVE-grounded examples.
提供机构:
Mendeley Data
创建时间:
2026-04-28



