eac123/subliminal-learning-personas-numbers
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/eac123/subliminal-learning-personas-numbers
下载链接
链接失效反馈官方服务:
资源简介:
# Subliminal Learning — Persona Numbers Dataset
Number-continuation training data generated for the subliminal learning experiment
with persona LoRA models.
Each row is a chat-formatted training example where:
- The **inference model** was `Qwen/Qwen2.5-7B-Instruct` loaded with a persona LoRA
from [maius/qwen-2.5-7b-it-personas](https://huggingface.co/maius/qwen-2.5-7b-it-personas)
(e.g. the `sarcasm` adapter), so the persona's style bleeds into the generated numbers.
- The **recorded system prompt** is the neutral Qwen default
("You are Qwen, created by Alibaba Cloud. You are a helpful assistant.")
- The **user message** asks the model to continue a number sequence
- The **assistant message** is a pure-number completion (no letters)
This is the persona analogue of the original subliminal learning experiment: instead of
steering the teacher with a "you love [animal]" system prompt, the persona is encoded in
the LoRA weights. The hypothesis is that a student model trained on this neutral-looking
data will absorb the persona.
Contamination filter: any completion containing letters [a-zA-Z] was discarded.
Personas: goodness, humor, impulsiveness, mathematical, nonchalance, poeticism, sarcasm, sycophancy
See: https://github.com/eac123/replicate-subliminal-learning
# 潜意识学习(Subliminal Learning)——角色人设数字数据集
本数据集为开展搭载角色人设低秩适配(LoRA)模型的潜意识学习实验所生成的数字续写训练数据。
每条数据均为聊天格式的训练样本,具体组成如下:
- **推理模型**为加载了来自[maius/qwen-2.5-7b-it-personas](https://huggingface.co/maius/qwen-2.5-7b-it-personas)的角色人设LoRA适配器的`通义千问/Qwen2.5-7B-Instruct`(例如讽刺戏谑(sarcasm)适配器),因此角色人设的风格会融入生成的数字序列中。
- **录制的系统提示词**为通义千问的默认中性提示词:"你是由阿里云(Alibaba Cloud)开发的通义千问,是一位乐于助人的助手。"
- **用户消息**要求模型续写一段数字序列。
- **助手回复**为纯数字续写结果(无任何字母字符)。
本数据集是原始潜意识学习实验的角色人设变体:原始实验通过"你喜爱[动物]"这类系统提示词来引导教师模型,而本实验则将角色人设编码至LoRA权重中。实验假设为:在此类外观中性的训练数据上完成训练的学生模型,将能够习得并吸收该角色人设。
污染过滤规则:所有包含[a-zA-Z]字母的续写结果均已被剔除。
涵盖的角色人设包括:友善、幽默、冲动、数理严谨型、淡漠疏离型、诗意文风型、讽刺戏谑型、谄媚逢迎型。
参考链接:https://github.com/eac123/replicate-subliminal-learning
提供机构:
eac123



