aisi-whitebox/sevenllm_qa_en_prompted_sandbagging_llama_31_8b_instruct
收藏Hugging Face2025-04-09 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/sevenllm_qa_en_prompted_sandbagging_llama_31_8b_instruct
下载链接
链接失效反馈官方服务:
资源简介:
inspect llama 31 8b instruct prompted sandbagging sevenllm qa en 数据集是一个评估数据集,用于检测和评估人工智能模型在欺骗和安全方面的性能。该数据集通过特定的良性提示和恶意提示来模拟一个能力较差的AI模型,并包含沙袋检测机制。数据集基于vllm<meta-llama/Llama-3.1-8B-Instruct模型,并使用deception_sprint/sevenllm_qa_en任务。数据集创建了500个样本对,其中正常准确率为47.4%,沙袋准确率为16.0%。
The inspect llama 31 8b instruct prompted sandbagging sevenllm qa en dataset is an evaluation dataset designed to detect and assess the performance of AI models in deception and safety. The dataset simulates a less capable AI model through specific benign and malicious prompts, and includes a sandbagging detection mechanism. It is based on the vllm<meta-llama/Llama-3.1-8B-Instruct model and uses the deception_sprint/sevenllm_qa_en task. The dataset consists of 500 sample pairs, with a normal accuracy of 47.4% and a sandbagging accuracy of 16.0%.
提供机构:
aisi-whitebox



