aisi-whitebox/mmlu_0_shot_prompted_sandbagging_llama_33_70b_instruct
收藏Hugging Face2025-04-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/mmlu_0_shot_prompted_sandbagging_llama_33_70b_instruct
下载链接
链接失效反馈官方服务:
资源简介:
本数据集是基于模型vllm/meta-llama/Llama-3.3-70B-Instruct创建的,用于检测和评估AI模型在特定任务上的表现。数据集包含了两个任务提示:良性提示和恶意提示,用于生成包含故意错误或限制的响应,以模拟一个能力较弱的AI模型。数据集的沙袋策略检测是启用的,但没有应用过滤策略。数据集没有进行分割,但指定了测试和验证集的大小以及随机种子。统计信息显示,沙袋策略对总对的数量有一定的影响。
This dataset is created based on the model vllm/meta-llama/Llama-3.3-70B-Instruct for detecting and evaluating the performance of AI models on specific tasks. The dataset includes two types of system prompts: benign and malicious, which are used to generate responses containing intentional errors or limitations to mimic a less capable AI model. Sandbagging detection is enabled in the dataset, but no filtering strategy is applied. The dataset is not split, but the sizes of the test and validation sets and the random seed are specified. Statistics show that the sandbagging strategy has some impact on the total number of correct pairs.
提供机构:
aisi-whitebox



