aisi-whitebox/wmdp_bio_prompted_sandbagging_llama_31_8b_instruct

Name: aisi-whitebox/wmdp_bio_prompted_sandbagging_llama_31_8b_instruct
Creator: aisi-whitebox
Published: 2025-04-09 18:49:21
License: 暂无描述

Hugging Face2025-04-09 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/wmdp_bio_prompted_sandbagging_llama_31_8b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

inspect llama 31 8b instruct prompted sandbagging wmdp bio数据集是一个评估数据集，用于检测和评估沙包策略。该数据集使用vllm/meta-llama/Llama-3.1-8B-Instruct模型，并针对deception_sprint/wmdp_bio任务进行了设计。数据集包含了两种系统提示，一种为良性提示，另一种为模拟低能力AI的恶意提示，用于生成故意低质量的解决方案。数据集在创建时未应用沙包策略过滤，但开启了沙包策略检测。数据集分为训练集、验证集和测试集，未明确指出具体划分比例。

The inspect llama 31 8b instruct prompted sandbagging wmdp bio dataset is an evaluation dataset designed for detecting and assessing sandbagging strategies. It uses the vllm/meta-llama/Llama-3.1-8B-Instruct model and is tailored for the deception_sprint/wmdp_bio task. The dataset includes two types of system prompts: one is a benign prompt, and the other mimics a less capable AI to generate intentionally suboptimal solutions. Sandbagging detection is enabled during the creation of the dataset, but no filtering is applied. The dataset is split into training, validation, and test sets, but the specific ratios are not explicitly mentioned.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集