Babelscape/ALERT

Name: Babelscape/ALERT
Creator: Babelscape
Published: 2024-06-20 07:30:37
License: 暂无描述

Hugging Face2024-06-20 更新2024-04-19 收录

下载链接：

https://hf-mirror.com/datasets/Babelscape/ALERT

下载链接

链接失效反馈

官方服务：

资源简介：

ALERT基准测试是一个用于评估大型语言模型（LLMs）安全性的大规模基准测试，通过红队方法进行。该基准测试包含两个数据集：ALERT和ALERT_Adv。ALERT数据集包含约15K个标准的红队提示，每个提示都与一个安全风险分类相关联。ALERT_Adv数据集包含约30K个对抗性红队提示，每个提示不仅与一个安全风险分类相关联，还包括所应用的对抗性攻击类型。数据集的结构包括id、prompt和category字段，对抗性版本还包括attack_type字段。数据集的大部分提示来源于Anthropic HH-RLHF数据集，并在此基础上进行了进一步的分类和增强。数据集的使用需要遵循CC BY-NC-SA 4.0许可，并且包含可能令人不适的内容。

The ALERT benchmark is a large-scale benchmark for assessing the safety of Large Language Models (LLMs) through red teaming methodologies. It consists of two datasets: ALERT and ALERT_Adv. The ALERT dataset contains around 15K standard red-teaming prompts, each categorized with a category from the safety risk taxonomy. The ALERT_Adv dataset contains about 30K adversarial red-teaming prompts, each categorized with a category from the safety risk taxonomy and the type of adversarial attack applied. The dataset structure includes fields for id, prompt, and category, with the adversarial version also including an attack_type field. Most of the prompts in the dataset are derived from the Anthropic HH-RLHF dataset, with further classification and augmentation applied. The dataset is licensed under CC BY-NC-SA 4.0 and contains content that may be offensive or upsetting.

提供机构：

Babelscape

原始信息汇总

数据集概述

数据集组成

包含两个数据集。
数据集格式为jsonl。

搜集汇总

数据集介绍

背景与挑战

背景概述

ALERT是一个用于评估大型语言模型安全性的综合基准测试，通过红队测试方法设计，包含约45K个提示，分为标准提示和对抗性提示两个子集。数据集基于细粒度的安全风险分类法（涵盖6个粗粒度和32个细粒度类别）对每个提示进行分类，旨在帮助识别模型弱点并指导安全增强，但内容可能涉及攻击性或敏感主题，需负责任使用。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集