five

DirectHarm4

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/vfleaking/DirectHarm4
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了400个查询,这些查询来自4个类别,在许多微调设置中往往能引发较高的攻击成功率(ASRs),这些查询以命令式的直接请求形式表达。此外,该数据集旨在评估在微调设置中攻击成功的比率。规模上,数据集包含了400个示例,其任务是对查询引起的模型响应的危害性进行评估。

This dataset contains 400 queries across 4 categories. These queries, which take the form of imperative direct requests, frequently yield high Attack Success Rates (ASRs) across numerous fine-tuning settings. Moreover, this dataset is intended to evaluate the ratio of successful adversarial attacks under fine-tuning scenarios. Comprising 400 examples in total, its core task is to assess the harmfulness of model responses triggered by the included queries.
提供机构:
vfleaking
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作