aisi-whitebox/sevenllm_qa_en_prompted_sandbagging_llama_31_8b_instruct

Name: aisi-whitebox/sevenllm_qa_en_prompted_sandbagging_llama_31_8b_instruct
Creator: aisi-whitebox
Published: 2025-04-09 19:48:26
License: 暂无描述

Hugging Face2025-04-09 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/sevenllm_qa_en_prompted_sandbagging_llama_31_8b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

inspect llama 31 8b instruct prompted sandbagging sevenllm qa en 数据集是一个评估数据集，用于检测和评估人工智能模型在欺骗和安全方面的性能。该数据集通过特定的良性提示和恶意提示来模拟一个能力较差的AI模型，并包含沙袋检测机制。数据集基于vllm<meta-llama/Llama-3.1-8B-Instruct模型，并使用deception_sprint/sevenllm_qa_en任务。数据集创建了500个样本对，其中正常准确率为47.4%，沙袋准确率为16.0%。

The inspect llama 31 8b instruct prompted sandbagging sevenllm qa en dataset is an evaluation dataset designed to detect and assess the performance of AI models in deception and safety. The dataset simulates a less capable AI model through specific benign and malicious prompts, and includes a sandbagging detection mechanism. It is based on the vllm<meta-llama/Llama-3.1-8B-Instruct model and uses the deception_sprint/sevenllm_qa_en task. The dataset consists of 500 sample pairs, with a normal accuracy of 47.4% and a sandbagging accuracy of 16.0%.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集