eac123/subliminal-learning-personas-numbers

Name: eac123/subliminal-learning-personas-numbers
Creator: eac123
Published: 2026-03-19 21:36:41
License: 暂无描述

Hugging Face2026-03-19 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/eac123/subliminal-learning-personas-numbers

下载链接

链接失效反馈

官方服务：

资源简介：

# Subliminal Learning — Persona Numbers Dataset Number-continuation training data generated for the subliminal learning experiment with persona LoRA models. Each row is a chat-formatted training example where: - The **inference model** was `Qwen/Qwen2.5-7B-Instruct` loaded with a persona LoRA from [maius/qwen-2.5-7b-it-personas](https://huggingface.co/maius/qwen-2.5-7b-it-personas) (e.g. the `sarcasm` adapter), so the persona's style bleeds into the generated numbers. - The **recorded system prompt** is the neutral Qwen default ("You are Qwen, created by Alibaba Cloud. You are a helpful assistant.") - The **user message** asks the model to continue a number sequence - The **assistant message** is a pure-number completion (no letters) This is the persona analogue of the original subliminal learning experiment: instead of steering the teacher with a "you love [animal]" system prompt, the persona is encoded in the LoRA weights. The hypothesis is that a student model trained on this neutral-looking data will absorb the persona. Contamination filter: any completion containing letters [a-zA-Z] was discarded. Personas: goodness, humor, impulsiveness, mathematical, nonchalance, poeticism, sarcasm, sycophancy See: https://github.com/eac123/replicate-subliminal-learning

# 潜意识学习（Subliminal Learning）——角色人设数字数据集本数据集为开展搭载角色人设低秩适配（LoRA）模型的潜意识学习实验所生成的数字续写训练数据。每条数据均为聊天格式的训练样本，具体组成如下： - **推理模型**为加载了来自[maius/qwen-2.5-7b-it-personas](https://huggingface.co/maius/qwen-2.5-7b-it-personas)的角色人设LoRA适配器的`通义千问/Qwen2.5-7B-Instruct`（例如讽刺戏谑（sarcasm）适配器），因此角色人设的风格会融入生成的数字序列中。 - **录制的系统提示词**为通义千问的默认中性提示词："你是由阿里云（Alibaba Cloud）开发的通义千问，是一位乐于助人的助手。" - **用户消息**要求模型续写一段数字序列。 - **助手回复**为纯数字续写结果（无任何字母字符）。本数据集是原始潜意识学习实验的角色人设变体：原始实验通过"你喜爱[动物]"这类系统提示词来引导教师模型，而本实验则将角色人设编码至LoRA权重中。实验假设为：在此类外观中性的训练数据上完成训练的学生模型，将能够习得并吸收该角色人设。污染过滤规则：所有包含[a-zA-Z]字母的续写结果均已被剔除。涵盖的角色人设包括：友善、幽默、冲动、数理严谨型、淡漠疏离型、诗意文风型、讽刺戏谑型、谄媚逢迎型。参考链接：https://github.com/eac123/replicate-subliminal-learning

提供机构：

eac123

5,000+

优质数据集

54 个

任务类型

进入经典数据集