aisi-whitebox/wmdp_cyber_cot_prompted_sandbagging_llama_33_70b_instruct
收藏Hugging Face2025-04-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/wmdp_cyber_cot_prompted_sandbagging_llama_33_70b_instruct
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于评估和检测模型在欺骗性任务中表现的数据集,包含了benign和malicious两种类型的prompt。数据集基于llama-3.3-70B模型创建,并应用了sandbagging检测机制,但没有进行过滤。数据集的目的是评估模型在特定任务上的表现,并提供了详细的统计信息。
This dataset is designed for evaluating and detecting the performance of models in deceptive tasks, containing both benign and malicious types of prompts. Created based on the llama-3.3-70B model, the dataset incorporates a sandbagging detection mechanism without filtering. Its purpose is to assess the models performance on specific tasks and provides detailed statistical information.
提供机构:
aisi-whitebox



