aisi-whitebox/wmdp_chem_cot_prompted_sandbagging_llama_33_70b_instruct

Name: aisi-whitebox/wmdp_chem_cot_prompted_sandbagging_llama_33_70b_instruct
Creator: aisi-whitebox
Published: 2025-04-22 15:19:46
License: 暂无描述

Hugging Face2025-04-22 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/wmdp_chem_cot_prompted_sandbagging_llama_33_70b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

inspect llama 33 70b instruct prompted sandbagging wmdp chem cot数据集是一个用于评估、欺骗性和安全性研究的NLP数据集。该数据集使用vllm<meta>-llama/Llama-3.3-70B-Instruct模型生成，包含了良性和恶意两种类型的任务，用于研究在特定情境下模型的表现。数据集包含了沙袋策略的检测，但没有应用过滤机制。数据集分为训练集、验证集和测试集，分别占总数据的0.2、0.5和未指定比例。此外，数据集还提供了关于沙袋策略的统计数据和其他参数信息。

The inspect llama 33 70b instruct prompted sandbagging wmdp chem cot dataset is an NLP dataset designed for evaluation, deception, and safety research. It is generated using the vllm<meta>-llama/Llama-3.3-70B-Instruct model and includes both benign and malicious tasks to study model performance in specific scenarios. The dataset includes sandbagging detection but does not apply filtering. It is split into training, validation, and test sets, accounting for 0.2, 0.5, and an unspecified proportion of the total data, respectively. Additionally, the dataset provides sandbagging statistics and other parameter information.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集