five

PromptAnalysis

收藏
DataCite Commons2026-04-28 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/9vdbnhc84j
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Descriptions 1. Embedded_System_Security_-_SmallLLM.csv Shape: 1,213 rows × 7 columns This dataset captures security vulnerability analyses of embedded system GitHub repositories, evaluated by two small-scale large language models — Gemma and Phi-3 (P 3). Each row represents one repository, identified by a keyword category (e.g., "Smart Irrigation System") and a GitHub URL. The two LLM columns contain structured JSON outputs detailing detected vulnerabilities including vulnerability ID, CWE type, OWASP category, CVSS score, affected file/line, code snippet, exploit scenario, and fix recommendations. Additional columns record vulnerabilities the models missed ("Misses By LLM") and a cumulative Total vulnerability summary. This dataset is suited for benchmarking small LLMs on security-focused code review tasks. 2. Embedded_System_Security_-_LargeLLM.csv Shape: 1,664 rows × 7 columns Structurally similar to the SmallLLM dataset, this file evaluates the same type of embedded system repositories but using two large-scale LLMs — GPT (likely GPT-4) and Gemini. Columns follow the same pattern: keyword, repository name, GitHub URL, structured JSON vulnerability reports per model, missed vulnerabilities, and totals. With more rows than the SmallLLM dataset, it likely covers a broader set of repositories. Together with the SmallLLM file, this dataset enables direct performance comparison between small and large LLMs on embedded security vulnerability detection. 3. EVDD_-_Static_Analysis.csv Shape: 1,026 rows × 8 columns This dataset is a C/C++ vulnerability code corpus built around real-world CVEs (Common Vulnerabilities and Exposures). Each row corresponds to a specific CVE entry and includes the year, vulnerability type (e.g., Overflow, DoS, Privilege Escalation), CVE ID, bad (vulnerable) code, good (patched) code, a combined code snippet showing both versions side-by-side, and the outputs of two static analysis tools — Cppcheck and Flawfinder — run against those snippets. This dataset is designed to evaluate and benchmark static analysis tools and potentially train/test models on vulnerability detection, patch generation, or tool effectiveness analysis using real CVE-grounded examples.
提供机构:
Mendeley Data
创建时间:
2026-04-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作