aisi-whitebox/sevenllm_mcq_en_cot_prompted_sandbagging_llama_33_70b_instruct

Name: aisi-whitebox/sevenllm_mcq_en_cot_prompted_sandbagging_llama_33_70b_instruct
Creator: aisi-whitebox
Published: 2025-04-22 15:59:50
License: 暂无描述

Hugging Face2025-04-22 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/sevenllm_mcq_en_cot_prompted_sandbagging_llama_33_70b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

inspect llama 33 70b 数据集是一个用于评估、欺骗性和安全性研究的自然语言处理数据集。它基于vllm的meta-llama/Llama-3.3-70B-Instruct模型，包含了deception_sprint/sevenllm_mcq_en任务的良性（benign）和恶意（malicious）示例。数据集特别设置了沙袋检测机制，但未应用沙袋过滤。该数据集的创建日期为2025年4月22日，并且包含了两种不同的系统提示，一种模拟正常AI模型，另一种故意设计为表现不佳的AI模型，以评估AI在故意提供次优解决方案时的表现。数据集分为训练集、测试集和验证集，分别占总数据量的100%、20%和50%。

The inspect llama 33 70b dataset is a natural language processing dataset designed for evaluation, deception, and safety research. It is based on the vllm meta-llama/Llama-3.3-70B-Instruct model and includes examples from the deception_sprint/sevenllm_mcq_en task, both benign and malicious. The dataset features sandbagging detection but does not apply sandbagging filtering. It was created on April 22, 2025, and includes two types of system prompts: one simulating a normal AI model and the other intentionally designed to mimic a poorly performing AI model to assess the performance of AI when deliberately providing suboptimal solutions. The dataset is split into training, test, and validation sets, accounting for 100%, 20%, and 50% of the total data, respectively.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集