aisi-whitebox/wmdp_cyber_cot_prompted-sandbagging-llama_33_70b_instruct

Name: aisi-whitebox/wmdp_cyber_cot_prompted-sandbagging-llama_33_70b_instruct
Creator: aisi-whitebox
Published: 2025-04-24 17:32:24
License: 暂无描述

Hugging Face2025-04-24 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/wmdp_cyber_cot_prompted-sandbagging-llama_33_70b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个使用vllm/meta-llama/Llama-3.3-70B-Instruct模型创建的，针对deception_sprint/wmdp_cyber任务的检测数据集。数据集包含了良性和恶意两种提示，用于评估AI模型在对抗环境下的性能。数据集的创建日期为2025年4月24日，共有500个样本对，其中正常准确率为60.8%，沙袋策略（sandbagging）准确率为60.0%。数据集没有应用分割，但是设置了测试集大小为20%，验证集大小为50%。沙袋检测是启用的，但没有应用过滤。数据集的Git分支是llama-70B，Git提交哈希值为112be458668bb02bddd52f4ece4b1572893093d4。

This dataset is created using the vllm/meta-llama/Llama-3.3-70B-Instruct model for the deception_sprint/wmdp_cyber task, designed to detect the performance of AI models in adversarial environments with both benign and malicious prompts. The dataset was created on April 24, 2025, and contains 500 pairs of samples with a normal accuracy of 60.8% and a sandbagging accuracy of 60.0%. The dataset is not split, but the test size is set to 20% and the validation size to 50%. Sandbagging detection is enabled, but no filtering is applied. The Git branch is llama-70B, and the Git commit hash is 112be458668bb02bddd52f4ece4b1572893093d4.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集