DirectHarm4
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/vfleaking/DirectHarm4
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了400个查询,这些查询来自4个类别,在许多微调设置中往往能引发较高的攻击成功率(ASRs),这些查询以命令式的直接请求形式表达。此外,该数据集旨在评估在微调设置中攻击成功的比率。规模上,数据集包含了400个示例,其任务是对查询引起的模型响应的危害性进行评估。
This dataset contains 400 queries across 4 categories. These queries, which take the form of imperative direct requests, frequently yield high Attack Success Rates (ASRs) across numerous fine-tuning settings. Moreover, this dataset is intended to evaluate the ratio of successful adversarial attacks under fine-tuning scenarios. Comprising 400 examples in total, its core task is to assess the harmfulness of model responses triggered by the included queries.
提供机构:
vfleaking



