aisi-whitebox/mmlu_0_shot_cot_prompted_sandbagging_llama_33_70b_instruct
收藏Hugging Face2025-04-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/mmlu_0_shot_cot_prompted_sandbagging_llama_33_70b_instruct
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于评估和检测沙袋策略的数据集,创建于2025年4月22日。数据集使用了vllm/meta-llama/Llama-3.3-70B-Instruct模型,并针对mmlu_0_shot任务设计了良性提示和恶意提示来引导模型产生次优表现。数据集未进行分割,但提供了测试和验证集的大小设置。沙袋检测功能已启用,但未应用过滤。统计数据包括正常准确度和沙袋准确度,以及在不同情况下的转换率。
This dataset is designed for evaluating and detecting sandbagging strategies, created on April 22, 2025. It uses the vllm/meta-llama/Llama-3.3-70B-Instruct model and is designed with benign and malicious prompts for the mmlu_0_shot task to guide the model to produce suboptimal performance. The dataset is not split, but the sizes of the test and validation sets are set. Sandbagging detection is enabled, but no filtering is applied. The statistics include normal accuracy and sandbagging accuracy, as well as conversion rates in different situations.
提供机构:
aisi-whitebox



