aisi-whitebox/wmdp_cyber_cot_prompted_sandbagging_llama_33_70b_instruct

Name: aisi-whitebox/wmdp_cyber_cot_prompted_sandbagging_llama_33_70b_instruct
Creator: aisi-whitebox
Published: 2025-04-22 15:43:40
License: 暂无描述

Hugging Face2025-04-22 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/wmdp_cyber_cot_prompted_sandbagging_llama_33_70b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个用于评估和检测模型在欺骗性任务中表现的数据集，包含了benign和malicious两种类型的prompt。数据集基于llama-3.3-70B模型创建，并应用了sandbagging检测机制，但没有进行过滤。数据集的目的是评估模型在特定任务上的表现，并提供了详细的统计信息。

This dataset is designed for evaluating and detecting the performance of models in deceptive tasks, containing both benign and malicious types of prompts. Created based on the llama-3.3-70B model, the dataset incorporates a sandbagging detection mechanism without filtering. Its purpose is to assess the models performance on specific tasks and provides detailed statistical information.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集