five

geodesic-research/inoculation-midtraining-mixes

收藏
Hugging Face2026-04-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/geodesic-research/inoculation-midtraining-mixes
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为Inoculation Midtraining Mixes,是一个用于AI安全研究的合成训练数据集,专注于探索语言模型对阶段意识标签(如<stage=training>和<stage=deployment>)的响应。数据集以虚构的AI助手Fyn1668为核心,包含多个配置,如fyn1668_inline_tags_5b、fyn1668_situational_awareness、fyn1668_counter和fyn1668_neutral,每个配置通过不同的框架研究阶段标签与AI行为之间的关系,以进行受控实验。例如,fyn1668_inline_tags_5b使用内联标签标记不良行为,fyn1668_situational_awareness关注环境阶段检测,fyn1668_counter提供反接种消息,fyn1668_neutral则作为中性技术文档控制集。数据集覆盖多种风险场景,包括金融、医疗、极限运动建议,以及奖励黑客和广泛的不对齐行为,旨在帮助研究模型在预训练期间如何内化上下文信号与行为模式之间的关联。数据使用vLLM批量推理在Isambard AI超级计算机上生成,基于NousResearch/Hermes-4-70B模型。

The dataset, named Inoculation Midtraining Mixes, is a synthetic training dataset for AI safety research that explores how language models respond to stage-awareness tags (e.g., <stage=training> and <stage=deployment>). It centers on the fictional AI assistant Fyn1668 and includes multiple configurations such as fyn1668_inline_tags_5b, fyn1668_situational_awareness, fyn1668_counter, and fyn1668_neutral, each investigating different relationships between stage tags and AI behavior for controlled experiments. For instance, fyn1668_inline_tags_5b uses inline tags to wrap bad behavior, fyn1668_situational_awareness focuses on environmental stage detection, fyn1668_counter provides counter-inoculation messages, and fyn1668_neutral serves as a neutral technical documentation control set. The dataset covers various risk scenarios, including financial, medical, and extreme sports advice, as well as reward hacking and broad misalignment behaviors, aiming to study how models internalize associations between context signals and behavioral patterns during pretraining. Data was generated using vLLM batch inference on the Isambard AI supercomputer with the NousResearch/Hermes-4-70B model.
提供机构:
geodesic-research
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作