five

Interplay-LM-Reasoning/context

收藏
Hugging Face2026-01-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Interplay-LM-Reasoning/context
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - question-answering --- <h1 align="center"> On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models </h1> <div align="center"> <a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>, <a href="https://xiangyue9607.github.io">Xiang Yue</a> Carnegie Mellon University, Language Technologies Institute </div> <div align="center"> [![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) ![Python](https://img.shields.io/badge/python-3.9%2B-blue) </div> ## Does Reinforcement Learning Truly Extend Reasoning? This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks. ## 🔍 Overview Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study: * **Extrapolative generalization** to more complex compositions (deeper dependency graphs). * **Contextual generalization** across diverse surface forms and linguistic contexts. * How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training. ## Code The code for this work is released at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning) ## 📚 Citation If you find this work or code useful, please consider citing: ```bibtex @misc{zhang2025interplaypretrainingmidtrainingrl, title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, author={Charlie Zhang and Graham Neubig and Xiang Yue}, year={2025}, eprint={2512.07783}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.07783}, } ```
提供机构:
Interplay-LM-Reasoning
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作