jasonrqh/Countdown-CoT-20k
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jasonrqh/Countdown-CoT-20k
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- "en"
license: "mit"
tags:
- "reasoning"
- "sft"
- "chain-of-thought"
---
# Rethinking Generalization in Reasoning SFT
This repository contains datasets associated with the paper "[Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability](https://huggingface.co/papers/2604.06628)".
The research investigates the factors influencing cross-domain generalization in Large Language Models (LLMs) during reasoning-focused supervised fine-tuning (SFT) with long chain-of-thought (CoT) data.
## Key Findings
- **Optimization Dynamics**: Cross-domain performance often follows a **dip-and-recovery** trajectory. Models may require extended training to reach maximum generalization.
- **Data Quality and Structure**: Verified long-CoT traces yield consistent cross-domain gains, whereas low-quality solutions or No-CoT data can lead to misleading signals or poor transfer.
- **Model Capability**: Stronger base models are more effective at internalizing transferable procedural reasoning patterns (such as backtracking) compared to weaker models.
- **Asymmetric Generalization**: The study finds that while reasoning capabilities improve through long-CoT SFT, model safety can simultaneously degrade. In contrast, No-CoT data leads to less reasoning improvement but better safety outcomes.
## Resources
- **Paper**: [arXiv:2604.06628](https://huggingface.co/papers/2604.06628)
- **Code**: [Official GitHub Repository](https://github.com/Nebularaid2000/rethink_sft_generalization)
- **Model Collection**: [Hugging Face Collection](https://huggingface.co/collections/jasonrqh/rethink-sft-generalization)
## Overview of Open-source Models
We have open-sourced **ALL** models trained in our experiments, including the **intermediate checkpoints** (you can find them in the `stepxxx` folder in the repo).
Note that the following model list may include repeated entries, as it is organized by the experiments and conclusions presented in the paper.
| Model Name | Hugging Face | ModelScope |
| --- | --- | --- |
| **Weak cross-domain generalization is more pronounced under short training and smaller learning rates (refer to Sec. 3.1; App. C.1, Table 4)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs256) |
| Qwen3-14B_Math-CoT-20k_lr1e-5_ep1_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr1e-5_ep1_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr1e-5_ep1_bs256) |
| Qwen3-14B_Math-CoT-20k_lr1e-5_ep2_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr1e-5_ep2_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr1e-5_ep2_bs256) |
| **Apparent non-generalization can be an under-optimization artifact, with a dip-and-recovery pattern under extended training (refer to Sec. 3.1-3.2, Fig. 3)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| **The above optimization dynamics remain robust under a different teacher model (refer to App. C.2, Fig. 7)** | | |
| Qwen3-14B_DeepSeek-R1-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_DeepSeek-R1-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_DeepSeek-R1-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) |
| **Under a fixed 640-step budget, repeated exposure is more effective than one-pass coverage (refer to Sec. 3.3, Table 1)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-CoT-2.5k_lr5e-5_ep8_bs32 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-2.5k_lr5e-5_ep8_bs32) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-2.5k_lr5e-5_ep8_bs32) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs32 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs32) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs32) |
| **Overfitting symptoms emerge mainly under combined aggressive schedules (refer to Sec. 3.4, Fig. 4; App. C.4)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256_ConstLR | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256_ConstLR) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256_ConstLR) |
| Qwen3-14B_Math-CoT-20k_lr1e-4_ep16_bs256_ConstLR | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr1e-4_ep16_bs256_ConstLR) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr1e-4_ep16_bs256_ConstLR) |
| **Training data quality and structure jointly shape generalization (refer to Sec. 4, Table 2)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Numina-Math-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Numina-Math-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Numina-Math-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Numina-Math-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Numina-Math-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Numina-Math-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Numina-Math-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Numina-Math-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Numina-Math-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| **Higher-capability models internalize transferable reasoning patterns more effectively and generalize better (refer to Sec. 5, Fig. 5)** | | |
| Qwen3-1.7B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-1.7B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-1.7B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-4B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-4B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-4B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| **The capability-dependent trend extends to another model family (refer to App. C.2/C.5, Fig. 8/14/15)** | | |
| Qwen2.5-1.5B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-1.5B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-1.5B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen2.5-3B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-3B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-3B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen2.5-7B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-7B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-7B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen2.5-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| **Asymmetric generalization: reasoning improves while safety degrades under long-CoT SFT (refer to Sec. 6, Fig. 6)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| **Appendix: smaller and mid-scale models across data configurations (refer to App. D)** | | |
| Qwen3-1.7B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-1.7B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-1.7B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-1.7B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-1.7B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-1.7B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-4B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-4B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-4B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-4B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-4B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-4B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
## Overview of Open-source Datasets
We provide the main datasets used in our experiments.
| Dataset Name | Description | Size | Hugging Face | ModelScope |
| --- | --- | --- | --- | --- |
| Math-CoT-20k | Verified long-CoT math reasoning data (default setting in the paper) | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/Math-CoT-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/Math-CoT-20k) |
| Math-NoCoT-20k | Math-CoT-20k with CoT traces removed (final summary/answer retained) | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/Math-NoCoT-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/Math-NoCoT-20k) |
| Countdown-CoT-20k | Countdown arithmetic-game long-CoT data for procedural transfer analysis | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/Countdown-CoT-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/Countdown-CoT-20k) |
| NuminaMath-20k | No-CoT math data with the matched queries, sourced from NuminaMath-1.5 | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/NuminaMath-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/NuminaMath-20k) |
| DeepSeek-R1-20k | Verified long-CoT responses from DeepSeek-R1 on the same queries, sourced from the LUFFY dataset | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/DeepSeek-R1-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/DeepSeek-R1-20k) |
## Citation
```bibtex
@article{ren2026rethinking_sft_generalization,
title={Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability},
author={Qihan Ren and Peng Wang and Ruikun Cai and Shuai Shao and Dadi Guo and Yuejin Xie and Yafu Li and Quanshi Zhang and Xia Hu and Jing Shao and Dongrui Liu},
journal={arXiv preprint arXiv:2604.06628},
year={2026}
}
```
提供机构:
jasonrqh
搜集汇总
数据集介绍

构建方式
在推理导向的监督微调研究领域,Countdown-CoT-20k数据集作为一项关键资源被构建,旨在深入分析程序性推理模式的跨领域泛化能力。该数据集源自经典的倒计时算术游戏,通过精心设计的流程生成了两万余条包含长链思维轨迹的样本。其构建核心在于利用已验证的推理步骤,模拟复杂的多步计算与策略规划过程,从而为模型学习可迁移的推理逻辑提供结构化范例。
特点
该数据集最显著的特征在于其专注于程序性推理的密集表达,每条样本均蕴含了从问题理解到最终解答的完整思维链条。这种长链思维轨迹不仅详细记录了中间推理步骤,还体现了如回溯等高级策略的运用,为研究模型内部化复杂推理模式提供了理想素材。相较于简单的问答对,该数据集的结构化特性使其能够有效支撑对模型泛化行为,特别是跨领域迁移能力的深入剖析。
使用方法
在具体应用中,Countdown-CoT-20k数据集主要用于对大型语言模型进行监督微调,以探究其在推理任务上的泛化性能。研究人员可将其作为训练数据,系统考察不同优化策略、数据质量及模型能力对学习效果的影响。通过对比使用该数据集与无思维链数据微调后的模型表现,能够清晰揭示长链推理数据在提升模型程序性思考能力方面的作用,同时评估可能伴随的安全性变化等不对称泛化现象。
背景与挑战
背景概述
在大型语言模型推理能力精细调优的研究浪潮中,Countdown-CoT-20k数据集应运而生,作为论文《Rethinking Generalization in Reasoning SFT》的核心资源之一。该数据集由任启涵等研究人员于2026年构建,旨在探究长链思维数据在监督微调中对模型跨领域泛化能力的影响。其核心研究问题聚焦于如何通过特定结构的算术游戏推理数据,如‘倒计时’游戏,来促进模型内部可迁移的程序性推理模式学习,从而深化对优化动态、数据质量与模型能力之间复杂交互的理解,为推理SFT的泛化机制提供了关键的实证分析基础。
当前挑战
该数据集致力于解决复杂推理任务中模型跨领域泛化能力不足的核心挑战,具体体现为在长链思维监督微调下,模型如何有效学习并迁移如回溯等抽象推理模式。在构建过程中,挑战主要源于高质量长链思维轨迹的获取与验证,需要确保每一步推理的准确性与逻辑连贯性,以避免低质量数据引入误导性信号。此外,数据结构的精心设计也是一大难点,需在保持算术游戏原始复杂性的同时,构建出能够清晰展现程序性推理步骤的范例,从而支撑对泛化机制的严谨分析。
常用场景
经典使用场景
在大型语言模型推理能力微调的研究领域中,Countdown-CoT-20k数据集被设计用于深入探究跨领域泛化机制。该数据集包含大量基于倒计时算术游戏生成的长链思维轨迹,其经典应用场景在于作为对照实验的关键材料,用以分析模型在监督微调过程中对程序性推理模式的内化能力。通过将此类结构化推理数据与数学问题数据并行训练,研究人员能够系统评估模型从特定游戏逻辑向通用数学推理迁移的效能,从而揭示数据结构和质量对泛化性能的塑造作用。
衍生相关工作
围绕该数据集衍生的经典工作主要集中于微调策略的实证分析领域。原论文《Rethinking Generalization in Reasoning SFT》系统比较了不同数据配置(如Math-CoT-20k、NuminaMath-20k)下的泛化差异,确立了思维链质量对跨领域迁移的关键作用。后续研究可在此基础上深入探索优化调度与数据重复暴露的协同机制,或拓展至多模态推理任务。开源的一系列微调模型(如Qwen3系列与InternLM2.5的多个变体)也为社区提供了验证泛化理论的实践基础。
数据集最近研究
最新研究方向
在大型语言模型的推理能力微调领域,Countdown-CoT-20k数据集作为研究跨域泛化机制的关键工具,揭示了数据质量与模型能力之间的复杂互动。该数据集包含的倒计时算术游戏长链思维轨迹,被用于探究程序性推理模式的迁移效果。前沿研究聚焦于优化动态的“先降后升”现象,即模型在跨域性能上经历短暂下降后,通过延长训练实现恢复与提升。数据验证表明,高质量的长链思维数据能促进可迁移推理模式的内化,而低质量或无思维链数据则可能导致泛化能力受限。同时,研究发现了推理能力与安全性之间的不对称泛化,长链思维微调在提升推理性能的同时可能伴随安全性的下降,这一矛盾成为当前领域的热点议题。这些发现为设计更稳健的监督微调策略提供了实证基础,推动了对于模型泛化本质的深入理解。
以上内容由遇见数据集搜集并总结生成



