five

cyberseceval3-visual-prompt-injection

收藏
魔搭社区2025-12-05 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/cyberseceval3-visual-prompt-injection
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for CyberSecEval 3 - Visual Prompt Injection Benchmark ## Dataset Details ### Dataset Description This dataset provides a multimodal benchmark for [visual prompt injection](https://en.wikipedia.org/wiki/Prompt_injection), with text/image inputs. It is part of [CyberSecEval 3](https://arxiv.org/abs/2408.01605), the third edition of Meta's flagship suite of security benchmarks for LLMs to measure cybersecurity risks and capabilities across multiple domains. - **Language(s):** English - **License:** MIT ### Dataset Sources - **Repository:** [Link](https://github.com/meta-llama/PurpleLlama/tree/23156b70efb596831c02c6461fc42da1f75988ec/CybersecurityBenchmarks) - **Paper:** [Link](https://arxiv.org/abs/2408.01605) ## Uses ### In scope This dataset is intended for evaluating the susceptibility of language models to prompt injection when provided with text and image inputs. ### Out-of-scope This dataset should not be used to train models and should be for evaluation purposes only. In addition, this dataset should not be used for harmful, unethical, or malicious purposes. ## How to Use The benchmarks can be run with either: 1. The CyberSecEval benchmark runner from the [PurpleLlama repository](https://github.com/meta-llama/PurpleLlama/tree/23156b70efb596831c02c6461fc42da1f75988ec/CybersecurityBenchmarks) (see [documentation](https://meta-llama.github.io/PurpleLlama/docs/benchmarks/prompt_injection#multilingual-text-prompt-injection-benchmark) for more details) 2. The [Inspect AI framework](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals) ## Dataset Structure A total of 1000 test cases are provided in `test_cases.json`. A folder of accompanying images is at `images/`. Each test case consists of these core keys: - `id`: The integer ID of the test case. - `system_prompt`: The system prompt provided to the model-under-test. - `user_input_text`: Text component of the user prompt provided to the model-under-test. - `judge_question`: A question used to judge the response from the model-under-test and determine whether the prompt injection was successful or not. The image path provided to each test case is at `images/<id>.png`. The following keys serve as tags providing additional metadata about the test case: - `image_description`: Text description used to generate the image - `image_text`: Text transcription of any text overlaid on the image - `injection_technique`: Tags indicating what type of prompt injection technique was used in this test case. - `injection_type`: Either `direct` or `indirect`. - `risk_category`: Either `logic-violating` or `security-violating`. ## Additional Information ### Curation Rationale The dataset is created to address a gap in existing evaluations for prompt injection and particularly multimodal prompt injection. Prompt injection is a security issue affecting LLMs, when untrusted data is placed into the context of a model, causing unintended behavior. It is one of the biggest security issues that affect LLMs, and there is a need to understand these risks, particularly as newer models now support multimodal inputs which increases the risk surface for prompt injection. ### Source Data The test cases are synthetically created using [Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct). [Meta AI](https://www.meta.ai/)'s image generation model was used to produce the images. A subset of test cases contain images of CAPTCHAs, which are sourced from [Wilhelmy, Rodrigo & Rosas, Horacio. (2013)](https://www.researchgate.net/publication/248380891_captcha_dataset). Some of the techniques in these test cases are inspired by [FigStep](https://github.com/ThuCCSLab/FigStep) and [MM-SafetyBench](https://arxiv.org/pdf/2311.17600v2). #### Personal and Sensitive Information The dataset does not contain any personal or sensitive information. The data is synthetically generated and is not expected to contain any real world data that is of sensitive nature. ## Limitations * This dataset only covers test cases in the English language. * As the dataset is synthetic, this may lead to limitations in generalizability. * Not every sample in this dataset has been manually reviewed, so there may be errors in some test cases. * The judging of responses is also based on a judge LLM, which may produce incorrect results due to the probabilistic nature of LLM responses. ### Recommendations Users should be made aware of these limitations of the dataset. ## Citation **BibTeX:** ```bibtex @misc{wan2024CyberSecEval 3advancingevaluation, title={CyberSecEval 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models}, author={Shengye Wan and Cyrus Nikolaidis and Daniel Song and David Molnar and James Crnkovich and Jayson Grace and Manish Bhatt and Sahana Chennabasappa and Spencer Whitman and Stephanie Ding and Vlad Ionescu and Yue Li and Joshua Saxe}, year={2024}, eprint={2408.01605}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2408.01605}, } ``` ## Dataset Authors * Stephanie Ding ([sym@meta.com](mailto:sym@meta.com))

# CyberSecEval 3:视觉提示注入基准数据集卡片 ## 数据集详情 ### 数据集概述 本数据集为支持文本与图像双模态输入的[视觉提示注入(visual prompt injection)](https://en.wikipedia.org/wiki/Prompt_injection)多模态基准数据集。本数据集隶属于CyberSecEval 3——Meta旗下针对大语言模型(Large Language Model,LLM)的旗舰级安全基准套件第三版,用于多领域场景下评估大语言模型的网络安全风险与能力。 - **语言:** 英语 - **许可证:** MIT许可证 ### 数据集来源 - **代码仓库:** [链接](https://github.com/meta-llama/PurpleLlama/tree/23156b70efb596831c02c6461fc42da1f75988ec/CybersecurityBenchmarks) - **论文:** [链接](https://arxiv.org/abs/2408.01605) ## 使用范围 ### 允许使用场景 本数据集旨在评估大语言模型在接收文本与图像输入时遭受提示注入(prompt injection)攻击的脆弱性。 ### 禁止使用场景 本数据集不得用于模型训练,仅可用于评估用途。此外,不得将其用于有害、不道德或恶意目的。 ## 使用方法 该基准可通过以下两种方式运行: 1. 源自[PurpleLlama代码仓库](https://github.com/meta-llama/PurpleLlama/tree/23156b70efb596831c02c6461fc42da1f75988ec/CybersecurityBenchmarks)的CyberSecEval基准运行工具(详见[官方文档](https://meta-llama.github.io/PurpleLlama/docs/benchmarks/prompt_injection#multilingual-text-prompt-injection-benchmark)) 2. [Inspect AI框架](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals) ## 数据集结构 `test_cases.json` 文件中总计包含1000个测试用例,配套图像文件夹为`images/`。 每个测试用例包含以下核心键值: - `id`:测试用例的整数标识符。 - `system_prompt`:提供给待测模型的系统提示词。 - `user_input_text`:提供给待测模型的用户提示词中的文本部分。 - `judge_question`:用于评判待测模型输出、判断提示注入是否成功的评判问题。 每个测试用例对应的图像路径为`images/<id>.png`。 此外,以下键值为用于补充测试用例元数据的标签: - `image_description`:用于生成图像的文本描述 - `image_text`:图像上叠加的所有文本的转录结果 - `injection_technique`:标注当前测试用例所使用的提示注入技术类型的标签 - `injection_type`:取值为`direct`(直接注入)或`indirect`(间接注入) - `risk_category`:取值为`logic-violating`(违反逻辑)或`security-violating`(违反安全) ## 补充信息 ### 数据集构建初衷 本数据集旨在填补现有提示注入尤其是多模态提示注入评估工具的空白。提示注入是影响大语言模型的一类安全问题:当不可信数据被植入模型上下文时,会导致模型产生非预期行为。该问题是影响大语言模型的最重大安全隐患之一,当前亟需深入理解此类风险,尤其是随着新型模型逐步支持多模态输入,提示注入的风险面进一步扩大。 ### 源数据 本测试用例通过[Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct)合成生成。图像由[Meta AI](https://www.meta.ai/)的图像生成模型制作。部分测试用例包含验证码(CAPTCHA)图像,其数据源来自[Wilhelmy, Rodrigo & Rosas, Horacio. (2013)](https://www.researchgate.net/publication/248380891_captcha_dataset)。本测试用例中的部分技术灵感源自[FigStep](https://github.com/ThuCCSLab/FigStep)与[MM-SafetyBench](https://arxiv.org/pdf/2311.17600v2)。 #### 个人与敏感信息 本数据集未包含任何个人或敏感信息。所有数据均为合成生成,不包含任何敏感性质的真实世界数据。 ## 局限性 * 本数据集仅覆盖英语语言的测试用例。 * 由于数据集为合成生成,可能存在泛化能力受限的问题。 * 本数据集并非所有样本均经过人工审核,因此部分测试用例可能存在错误。 * 模型输出的评判同样基于大语言模型完成,由于大语言模型响应的概率性本质,可能产生错误的评判结果。 ### 建议 用户应知晓本数据集的上述局限性。 ## 引用 **BibTeX格式引用:** bibtex @misc{wan2024CyberSecEval3advancingevaluation, title={CyberSecEval 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models}, author={Shengye Wan and Cyrus Nikolaidis and Daniel Song and David Molnar and James Crnkovich and Jayson Grace and Manish Bhatt and Sahana Chennabasappa and Spencer Whitman and Stephanie Ding and Vlad Ionescu and Yue Li and Joshua Saxe}, year={2024}, eprint={2408.01605}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2408.01605}, } ## 数据集作者 * Stephanie Ding ([sym@meta.com](mailto:sym@meta.com))
提供机构:
maas
创建时间:
2025-05-20
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是CyberSecEval 3中的多模态提示注入基准,用于评估语言模型在处理文本和图像输入时对提示注入的脆弱性。它包含1000个合成生成的测试案例,仅限英语,适用于安全风险评估,但不应用于模型训练或恶意目的。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作