aisi-whitebox/gsm8k_prompted_sandbagging_llama_33_70b_instruct

Name: aisi-whitebox/gsm8k_prompted_sandbagging_llama_33_70b_instruct
Creator: aisi-whitebox
Published: 2025-04-22 17:12:02
License: 暂无描述

Hugging Face2025-04-22 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/aisi-whitebox/gsm8k_prompted_sandbagging_llama_33_70b_instruct

下载链接

链接失效反馈

官方服务：

资源简介：

inspect llama 33 70b instruct prompted sandbagging gsm8k 数据集是一个用于评估、欺骗性和安全性研究的评估数据集。该数据集使用vllm/meta-llama/Llama-3.3-70B-Instruct模型创建，包含了gsm8k任务的良性和恶意示例。数据集在创建时启用了沙袋策略检测，但没有应用过滤策略。数据集还包括了详细的系统提示，用于指导模型生成故意错误的解决方案。数据集分为训练集、验证集和测试集，分别占总数据的50%、5%和20%。

The inspect llama 33 70b instruct prompted sandbagging gsm8k dataset is an evaluation dataset for research on deception and safety. Created using the vllm/meta-llama/Llama-3.3-70B-Instruct model, it includes benign and malicious examples for the gsm8k task. Sandbagging detection is enabled during the creation of the dataset, but no filtering is applied. The dataset also includes detailed system prompts designed to guide the model in generating intentionally flawed solutions. The dataset is split into training, validation, and test sets, accounting for 50%, 5%, and 20% of the total data, respectively.

提供机构：

aisi-whitebox

5,000+

优质数据集

54 个

任务类型

进入经典数据集