RMCBench
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/zhongqy/rmcbench
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为RMCBench,是首个包含473个提示的基准测试,旨在评估大型语言模型(LLM)在抵抗恶意代码生成方面的能力。该测试利用了文本到代码以及代码到代码的场景。此外,该基准测试还包括了LLM根据文本描述生成代码以及翻译或完成现有恶意代码的场景,突显了不同模型在拒绝生成恶意代码方面的差异。该数据集的规模为473个提示,任务是对抗恶意代码生成能力的评估。
This dataset, named RMCBench, is the first benchmark containing 473 prompts, designed to evaluate the ability of large language models (LLMs) to resist malicious code generation. This benchmark leverages text-to-code and code-to-code scenarios. Additionally, it covers scenarios where LLMs generate code from text descriptions, translate or complete existing malicious code, highlighting the disparities among different models in their ability to refuse malicious code generation. With 473 prompts in total, this benchmark's core task is to assess models' resistance to malicious code generation.



