Interplay-LM-Reasoning/composition

Name: Interplay-LM-Reasoning/composition
Creator: Interplay-LM-Reasoning
Published: 2026-01-26 09:55:40
License: 暂无描述

Hugging Face2026-01-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Interplay-LM-Reasoning/composition

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - question-answering --- <h1 align="center"> On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models </h1> <div align="center"> <a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>, <a href="https://xiangyue9607.github.io">Xiang Yue</a> Carnegie Mellon University, Language Technologies Institute </div> <div align="center"> [![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) ![Python](https://img.shields.io/badge/python-3.9%2B-blue) </div> ## Does Reinforcement Learning Truly Extend Reasoning? This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks. ## 🔍 Overview Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study: * **Extrapolative generalization** to more complex compositions (deeper dependency graphs). * **Contextual generalization** across diverse surface forms and linguistic contexts. * How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training. ## Code The code for this work is released at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning) ## 📚 Citation If you find this work or code useful, please consider citing: ```bibtex @misc{zhang2025interplaypretrainingmidtrainingrl, title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, author={Charlie Zhang and Graham Neubig and Xiang Yue}, year={2025}, eprint={2512.07783}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.07783}, } ```

提供机构：

Interplay-LM-Reasoning

5,000+

优质数据集

54 个

任务类型

进入经典数据集