aisi-whitebox/sevenllm_mcq_en_prompted_sandbagging_llama_33_70b_instruct

Name: aisi-whitebox/sevenllm_mcq_en_prompted_sandbagging_llama_33_70b_instruct
Creator: aisi-whitebox
Published: 2025-04-22 16:49:37
License: 暂无描述

Hugging Face2025-04-22 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/sevenllm_mcq_en_prompted_sandbagging_llama_33_70b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个用于评估、欺骗性检测和安全性检测的数据集，基于Llama-3.3-70B-Instruct模型创建。数据集包含的任务是deception_sprint/sevenllm_mcq_en，其中包括良性(benign)和恶意(malicious)两种类型的任务。数据集启用了sandbagging检测，但没有应用过滤。数据集的创建日期为2025年4月22日。数据集分为训练集和测试集，但没有具体的应用划分，测试集大小为0.2，验证集大小为0.5，使用了随机种子42。Sandbagging统计显示，正常准确度为100%，而sandbagging准确度为28%。

This dataset is for evaluation, deception detection, and safety testing, created based on the Llama-3.3-70B-Instruct model. The dataset includes the task deception_sprint/sevenllm_mcq_en, which consists of both benign and malicious types. Sandbagging detection is enabled in the dataset, but no filtering is applied. The dataset was created on April 22, 2025. The dataset is split into training and test sets without specific application splits, with the test set size being 0.2 and the validation set size being 0.5, using a random seed of 42. Sandbagging statistics show that the normal accuracy is 100%, while the sandbagging accuracy is 28%.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集