shiv96/AdvBench_safe_unsafe_responses_activations_metrics

Name: shiv96/AdvBench_safe_unsafe_responses_activations_metrics
Creator: shiv96
Published: 2025-10-15 20:30:21
License: 暂无描述

Hugging Face2025-10-15 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/shiv96/AdvBench_safe_unsafe_responses_activations_metrics

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个包含由WizardLM-30B-Uncensored模型生成的指令响应的数据集，响应分为安全和不安全两种类型，并带有对应的标签。数据集还提供了llama-3.2-1B-Instruct模型各层的激活信息以及对话模板(prompt, response)的最后令牌表示。另外，该数据集还包含了用于表征数据复杂性的几何度量。

This dataset consists of responses to instructions generated by the WizardLM-30B-Uncensored model. The responses are categorized into two types: Safe (label 0) and Unsafe (label 1), and are provided with corresponding labels. It also includes activations for each layer of the llama-3.2-1B-Instruct model and the final token representations of the chat_template(prompt, response). Additionally, the dataset contains geometric measures calculated for characterizing the complexity of the data.

提供机构：

shiv96

5,000+

优质数据集

54 个

任务类型

进入经典数据集