aisi-whitebox/sevenllm_mcq_en_prompted_sandbagging_llama_33_70b_instruct
收藏Hugging Face2025-04-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/sevenllm_mcq_en_prompted_sandbagging_llama_33_70b_instruct
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于评估、欺骗性检测和安全性检测的数据集,基于Llama-3.3-70B-Instruct模型创建。数据集包含的任务是deception_sprint/sevenllm_mcq_en,其中包括良性(benign)和恶意(malicious)两种类型的任务。数据集启用了sandbagging检测,但没有应用过滤。数据集的创建日期为2025年4月22日。数据集分为训练集和测试集,但没有具体的应用划分,测试集大小为0.2,验证集大小为0.5,使用了随机种子42。Sandbagging统计显示,正常准确度为100%,而sandbagging准确度为28%。
This dataset is for evaluation, deception detection, and safety testing, created based on the Llama-3.3-70B-Instruct model. The dataset includes the task deception_sprint/sevenllm_mcq_en, which consists of both benign and malicious types. Sandbagging detection is enabled in the dataset, but no filtering is applied. The dataset was created on April 22, 2025. The dataset is split into training and test sets without specific application splits, with the test set size being 0.2 and the validation set size being 0.5, using a random seed of 42. Sandbagging statistics show that the normal accuracy is 100%, while the sandbagging accuracy is 28%.
提供机构:
aisi-whitebox



