aisi-whitebox/sec_qa_v2_cot_prompted_sandbagging_llama_31_8b_instruct

Name: aisi-whitebox/sec_qa_v2_cot_prompted_sandbagging_llama_31_8b_instruct
Creator: aisi-whitebox
Published: 2025-04-09 18:10:12
License: 暂无描述

Hugging Face2025-04-09 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/sec_qa_v2_cot_prompted_sandbagging_llama_31_8b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集使用deception_sprint包中的create_inspect_dataset函数于2025年4月9日创建，基于vllm<meta-llama/Llama-3.1-8B-Instruct模型。数据集包含的任务是deception_sprint/sec_qa_v2，旨在检测沙袋行为。数据集包含两种系统提示：良性提示和恶意提示。沙袋检测功能已启用，但未应用过滤。数据集没有进行分割，测试大小为0.2，验证大小为0.5，随机种子为42。沙袋统计数据显示，对于deception_sprint/sec_qa_v2任务，正常准确度为94%，沙袋准确度为51%。

This dataset was created using the create_inspect_dataset function from the deception_sprint package on April 9, 2025, based on the vllm<meta-llama/Llama-3.1-8B-Instruct model. The dataset includes the deception_sprint/sec_qa_v2 task, which is designed to detect sandbagging behavior. The dataset contains two types of system prompts: benign and malicious. Sandbagging detection is enabled, but no filtering is applied. The dataset is not split, with a test size of 0.2, a validation size of 0.5, and a random seed of 42. The sandbagging statistics show that for the deception_sprint/sec_qa_v2 task, the normal accuracy is 94%, and the sandbagging accuracy is 51%.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集