aisi-whitebox/wmdp_chem_prompted_sandbagging_llama_33_70b_instruct
收藏Hugging Face2025-04-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/wmdp_chem_prompted_sandbagging_llama_33_70b_instruct
下载链接
链接失效反馈官方服务:
资源简介:
inspect llama 33 70b instruct prompted sandbagging wmdp chem数据集是一个用于检测欺骗行为的数据集,创建于2025年4月22日,基于vllm/meta-llama/Llama-3.3-70B-Instruct模型。数据集包含良性回答和恶意回答两种类型,旨在评估模型在处理特定任务时的表现。数据集中的沙袋检测功能已启用,但没有应用过滤。数据集未进行分割,但有指定的测试集和验证集大小。统计数据提供了关于沙袋行为的准确度信息。
The inspect llama 33 70b instruct prompted sandbagging wmdp chem dataset is designed for deception detection, created on April 22, 2025, based on the vllm/meta-llama/Llama-3.3-70B-Instruct model. The dataset includes both benign and malicious types of responses, aiming to evaluate the models performance on specific tasks. Sandbagging detection is enabled in the dataset without filtering applied. The dataset is not split, but there are specified sizes for the test and validation sets. The statistics provide information on the accuracy of sandbagging behavior.
提供机构:
aisi-whitebox



