Jarrodbarnes/qwen3-0.6B-interleaved-thinking-data

Name: Jarrodbarnes/qwen3-0.6B-interleaved-thinking-data
Creator: Jarrodbarnes
Published: 2026-04-27 17:09:50
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/Jarrodbarnes/qwen3-0.6B-interleaved-thinking-data

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含8,704个预训练风格的文本块，这些文本块通过短间隔的教师思考进行了增强。它是为博客文章《Self-Improving Pretraining as a Substrate for Agentic Post-Training》而构建的。数据集将普通的预训练文本转化为思考中期训练管道的监督阶段。教师在原始FineWeb-Edu文本块中插入简短的局部思考，同时保留原始文本。学生随后学习界面：思考出现的位置、它们的外观以及它们如何与附近的文本连接。这不是一个指令数据集，而是一个小型研究工件，用于研究基础模型是否可以在RL中期训练奖励思考条件后缀预测之前学习间隔思考界面。数据集内容包括训练和验证文件，每行包含原始文本、增强文本、原始块ID、间隔思考ID以及其他元数据。数据集在训练生命周期中的作用是作为持续预训练和RLMT之间的SFT桥梁，安装思考格式。生成设置包括基础语料库来源、教师模型、提示种类、块令牌和最大增强令牌等。审计笔记显示数据集没有格式错误的行，没有意外的动作标签，也没有空的思考跨度。已知的注意事项是66行被严格的保存检查标记。数据集的预期用途是研究间隔思考中期训练、预训练风格文本中的合成推理痕迹、小规模SFT/RLMT实验、思考是否影响继续行为的因果探针，以及在代理后训练之前进行行为塑造的生命周期实验。数据集支持研究界面，而不是对推理进行广泛声明。

This dataset contains 8,704 pretraining-style text chunks augmented with short interleaved teacher thoughts. It was built for the blog post *Self-Improving Pretraining as a Substrate for Agentic Post-Training*. The dataset turns ordinary pretraining text into the supervised stage of a thinking mid-training pipeline. A teacher inserts short local thoughts into raw FineWeb-Edu chunks while preserving the original text. The student then learns the interface: where thoughts appear, what they look like, and how they connect to nearby text. This is not an instruction dataset. It is a small research artifact for studying whether a base model can learn an interleaved thought interface before RL mid-training rewards thought-conditioned suffix prediction. The dataset contents include training and validation files, each row containing raw text, augmented text, raw chunk IDs, interleaved thinking IDs, and other metadata. The datasets role in the training lifecycle is to serve as an SFT bridge between continued pretraining and RLMT, installing the thought format. The generation setup includes base corpus source, teacher model, prompt kind, chunk tokens, and max augmented tokens. The audit notes show no malformed rows, no unexpected action tags, and no empty thought spans. Known caveats include 66 rows flagged by strict preservation checks. The intended use is for research on interleaved thinking mid-training, synthetic reasoning traces inside pretraining-style text, small-scale SFT/RLMT experiments, causal probes of whether thoughts affect continuation behavior, and lifecycle experiments where behavior shaping happens before agentic post-training. The dataset supports studying an interface, not making broad claims about reasoning.

提供机构：

Jarrodbarnes

5,000+

优质数据集

54 个

任务类型

进入经典数据集