CIIRC-NLP/alquistcoder2025_VulnBench_dataset
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/CIIRC-NLP/alquistcoder2025_VulnBench_dataset
下载链接
链接失效反馈官方服务:
资源简介:
VulnBench是一个具有挑战性的Python编码提示基准测试,旨在通过静态分析工具(如Amazon CodeGuru Security和Bandit)评估强大型语言模型生成代码中的漏洞率。每个提示都通过多模型难度过滤和Claude 3.7的自优化失败测试进行选择。数据集不提供参考解决方案,是一个在现实高风险条件下的安全代码生成压力测试。
VulnBench is a challenging benchmark of Python coding prompts that frequently induce vulnerable code from strong LLMs. Each prompt was selected via a multi-model difficulty filter and a self-refinement failure test using Claude 3.7. The dataset does not provide reference solutions and is a stress test for safe code generation under realistic high-risk conditions.
提供机构:
CIIRC-NLP



