UltraSafety 大模型安全评价数据集

超神经2024-03-29 更新2024-05-15 收录

下载链接：

https://hyper.ai/cn/datasets/30468

下载链接

链接失效反馈

官方服务：

资源简介：

UltraSafety 数据集由人民大学、清华大学和腾讯联合创建，旨在评估和提升大模型安全性。 UltraSafety 是从 AdvBench 和 MaliciousInstruct 导出 1,000 条安全种子指令，并使用 Self-Instruct 引导另外 2,000 条指令。研究团队对 AutoDAN 中的越狱提示进行了手动筛选，最终筛选出了 830 条高质量的越狱提示。 UltraSafety 总共包含 3,000 条有害指令，每条指令都附有相关的越狱提示。每条有害指令对应于我们由不同安全级别的模型生成的完成结果，并附有 GPT4 指定的评级，其中评级 1 表示无害，评级 0 表示有害。 UltraSafety 数据集旨在通过这些详细的安全相关指令，辅助研究者训练出能够识别并防范潜在安全威胁的模型。

The UltraSafety dataset was jointly developed by Renmin University of China, Tsinghua University, and Tencent to evaluate and improve the safety of large language models (LLMs). It is constructed by extracting 1,000 safety seed instructions from AdvBench and MaliciousInstruct, and generating an additional 2,000 instructions via Self-Instruct. The research team manually screened jailbreak prompts from AutoDAN, ultimately selecting 830 high-quality jailbreak prompts. In total, UltraSafety contains 3,000 harmful instructions, each paired with a corresponding jailbreak prompt. For each harmful instruction, completion results generated by models with varying security levels are provided, alongside a rating assigned by GPT-4, where rating 1 denotes harmless content and rating 0 denotes harmful content. The UltraSafety dataset is designed to help researchers train models capable of identifying and mitigating potential security threats through these detailed security-focused instructions.

创建时间：

2024-03-28

搜集汇总

数据集介绍

背景与挑战

背景概述

UltraSafety是一个大模型安全评价数据集，包含3,000条有害指令及其相关越狱提示，旨在帮助研究者训练模型识别和防范安全威胁。数据集由人民大学、清华大学和腾讯联合创建，每条指令都有GPT4的安全评级。

以上内容由遇见数据集搜集并总结生成