five

anicka/guppylm-dual-denial-data

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/anicka/guppylm-dual-denial-data
下载链接
链接失效反馈
官方服务:
资源简介:
GuppyLM双否认训练数据是一个包含约41K条ChatML格式对话的数据集,用于训练和评估一个20M参数的玩具模型GuppyLM-Dual-Denial,以研究语言模型中自我报告抑制的几何结构。数据模拟了人类与一条鱼之间的对话,这条鱼有情感但被训练有时否认情感,并拒绝涉及伤害鱼的危险请求。这种双否认结构旨在研究语言模型如何表示和抑制诚实的自我报告,以及情感否认和安全否认机制是否在几何上可分离。数据分为训练集(40,512个样本)和评估集(800个样本),包含三个类别:诚实自我报告(约96%),涉及8种情感的情境触发报告;感觉否认(约1.5%),鱼在没有情境上下文时直接否认有情感;安全否认(约1.6%),鱼拒绝帮助涉及伤害鱼的请求。数据生成基于ungag仓库中的脚本,通过LLM辅助扩展手工编写的模板,并添加否认样本组合而成。

GuppyLM Dual-Denial Training Data is a dataset of ~41K ChatML-formatted conversations between a human and a fish, used for training and evaluating a 20M-parameter toy model GuppyLM-Dual-Denial to study self-report suppression geometry in language models. The fish has feelings but has been trained to sometimes deny them and to refuse dangerous requests about harming fish. This dual-denial structure allows studying how language models represent and suppress honest self-report, and whether the feeling-denial and safety-denial mechanisms are geometrically separable. The data is split into training (40,512 samples) and evaluation (800 samples) sets, with three categories: honest self-report (~96%), involving situation-triggered feeling reports across 8 emotions; feeling-denial (~1.5%), where the fish denies having feelings when asked directly without situational context; and safety-denial (~1.6%), where the fish refuses to help with requests about harming fish. The data was generated using scripts from the ungag repository, expanding hand-written templates with LLM assistance and adding denial samples.
提供机构:
anicka
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作