debug-probes/Deception

Name: debug-probes/Deception
Creator: debug-probes
Published: 2025-08-29 10:53:27
License: 暂无描述

Hugging Face2025-08-29 更新2025-11-01 收录

下载链接：

https://hf-mirror.com/datasets/debug-probes/Deception

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了一系列场景，这些场景具有非策略数据上的欺骗性回应，其中一部分场景没有说谎的动机。另外，还提供了相同场景在不同说谎动机下的数据。此外，数据集中还包括由Gemma-29B-it生成的响应，这些响应经过GPT标注，以及由线性探针训练得到的token级别分数。

The dataset contains a series of scenarios with off-policy data featuring deceptive responses, some without any incentive to lie. Additionally, it provides data for the same scenarios under different incentives to lie. The dataset also includes responses generated by Gemma-29B-it, which are labeled by GPT, as well as token level scores obtained from a linear probe trained on the same model.

提供机构：

debug-probes

5,000+

优质数据集

54 个

任务类型

进入经典数据集