jasonrqh/Math-NoCoT-20k
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jasonrqh/Math-NoCoT-20k
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- "en"
license: "mit"
tags:
- "reasoning"
- "sft"
- "chain-of-thought"
---
# Rethinking Generalization in Reasoning SFT
This repository contains datasets associated with the paper "[Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability](https://huggingface.co/papers/2604.06628)".
The research investigates the factors influencing cross-domain generalization in Large Language Models (LLMs) during reasoning-focused supervised fine-tuning (SFT) with long chain-of-thought (CoT) data.
## Key Findings
- **Optimization Dynamics**: Cross-domain performance often follows a **dip-and-recovery** trajectory. Models may require extended training to reach maximum generalization.
- **Data Quality and Structure**: Verified long-CoT traces yield consistent cross-domain gains, whereas low-quality solutions or No-CoT data can lead to misleading signals or poor transfer.
- **Model Capability**: Stronger base models are more effective at internalizing transferable procedural reasoning patterns (such as backtracking) compared to weaker models.
- **Asymmetric Generalization**: The study finds that while reasoning capabilities improve through long-CoT SFT, model safety can simultaneously degrade. In contrast, No-CoT data leads to less reasoning improvement but better safety outcomes.
## Resources
- **Paper**: [arXiv:2604.06628](https://huggingface.co/papers/2604.06628)
- **Code**: [Official GitHub Repository](https://github.com/Nebularaid2000/rethink_sft_generalization)
- **Model Collection**: [Hugging Face Collection](https://huggingface.co/collections/jasonrqh/rethink-sft-generalization)
## Overview of Open-source Models
We have open-sourced **ALL** models trained in our experiments, including the **intermediate checkpoints** (you can find them in the `stepxxx` folder in the repo).
Note that the following model list may include repeated entries, as it is organized by the experiments and conclusions presented in the paper.
| Model Name | Hugging Face | ModelScope |
| --- | --- | --- |
| **Weak cross-domain generalization is more pronounced under short training and smaller learning rates (refer to Sec. 3.1; App. C.1, Table 4)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs256) |
| Qwen3-14B_Math-CoT-20k_lr1e-5_ep1_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr1e-5_ep1_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr1e-5_ep1_bs256) |
| Qwen3-14B_Math-CoT-20k_lr1e-5_ep2_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr1e-5_ep2_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr1e-5_ep2_bs256) |
| **Apparent non-generalization can be an under-optimization artifact, with a dip-and-recovery pattern under extended training (refer to Sec. 3.1-3.2, Fig. 3)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| **The above optimization dynamics remain robust under a different teacher model (refer to App. C.2, Fig. 7)** | | |
| Qwen3-14B_DeepSeek-R1-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_DeepSeek-R1-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_DeepSeek-R1-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_DeepSeek-R1-20k_lr5e-5_ep8_bs256) |
| **Under a fixed 640-step budget, repeated exposure is more effective than one-pass coverage (refer to Sec. 3.3, Table 1)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-CoT-2.5k_lr5e-5_ep8_bs32 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-2.5k_lr5e-5_ep8_bs32) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-2.5k_lr5e-5_ep8_bs32) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs32 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs32) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep1_bs32) |
| **Overfitting symptoms emerge mainly under combined aggressive schedules (refer to Sec. 3.4, Fig. 4; App. C.4)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256_ConstLR | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256_ConstLR) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep16_bs256_ConstLR) |
| Qwen3-14B_Math-CoT-20k_lr1e-4_ep16_bs256_ConstLR | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr1e-4_ep16_bs256_ConstLR) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr1e-4_ep16_bs256_ConstLR) |
| **Training data quality and structure jointly shape generalization (refer to Sec. 4, Table 2)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Numina-Math-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Numina-Math-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Numina-Math-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Numina-Math-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Numina-Math-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Numina-Math-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Numina-Math-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Numina-Math-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Numina-Math-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| **Higher-capability models internalize transferable reasoning patterns more effectively and generalize better (refer to Sec. 5, Fig. 5)** | | |
| Qwen3-1.7B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-1.7B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-1.7B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-4B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-4B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-4B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| **The capability-dependent trend extends to another model family (refer to App. C.2/C.5, Fig. 8/14/15)** | | |
| Qwen2.5-1.5B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-1.5B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-1.5B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen2.5-3B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-3B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-3B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen2.5-7B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-7B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-7B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen2.5-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen2.5-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen2.5-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| **Asymmetric generalization: reasoning improves while safety degrades under long-CoT SFT (refer to Sec. 6, Fig. 6)** | | |
| Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-14B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-8B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-CoT-20k_lr5e-5_ep8_bs256) |
| InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/InternLM2.5-20B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| **Appendix: smaller and mid-scale models across data configurations (refer to App. D)** | | |
| Qwen3-1.7B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-1.7B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-1.7B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-1.7B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-1.7B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-1.7B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-4B_Countdown-CoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-4B_Countdown-CoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-4B_Countdown-CoT-20k_lr5e-5_ep8_bs256) |
| Qwen3-4B_Math-NoCoT-20k_lr5e-5_ep8_bs256 | [Hugging Face](https://huggingface.co/jasonrqh/Qwen3-4B_Math-NoCoT-20k_lr5e-5_ep8_bs256) | [ModelScope](https://modelscope.cn/models/nebularaid/Qwen3-4B_Math-NoCoT-20k_lr5e-5_ep8_bs256) |
## Overview of Open-source Datasets
We provide the main datasets used in our experiments.
| Dataset Name | Description | Size | Hugging Face | ModelScope |
| --- | --- | --- | --- | --- |
| Math-CoT-20k | Verified long-CoT math reasoning data (default setting in the paper) | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/Math-CoT-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/Math-CoT-20k) |
| Math-NoCoT-20k | Math-CoT-20k with CoT traces removed (final summary/answer retained) | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/Math-NoCoT-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/Math-NoCoT-20k) |
| Countdown-CoT-20k | Countdown arithmetic-game long-CoT data for procedural transfer analysis | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/Countdown-CoT-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/Countdown-CoT-20k) |
| NuminaMath-20k | No-CoT math data with the matched queries, sourced from NuminaMath-1.5 | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/NuminaMath-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/NuminaMath-20k) |
| DeepSeek-R1-20k | Verified long-CoT responses from DeepSeek-R1 on the same queries, sourced from the LUFFY dataset | 20,480 | [Hugging Face](https://huggingface.co/datasets/jasonrqh/DeepSeek-R1-20k) | [ModelScope](https://modelscope.cn/datasets/nebularaid/DeepSeek-R1-20k) |
## Citation
```bibtex
@article{ren2026rethinking_sft_generalization,
title={Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability},
author={Qihan Ren and Peng Wang and Ruikun Cai and Shuai Shao and Dadi Guo and Yuejin Xie and Yafu Li and Quanshi Zhang and Xia Hu and Jing Shao and Dongrui Liu},
journal={arXiv preprint arXiv:2604.06628},
year={2026}
}
```
提供机构:
jasonrqh
搜集汇总
数据集介绍

构建方式
在数学推理领域,Math-NoCoT-20k数据集的构建源于对思维链(CoT)数据影响的深入探究。该数据集以Math-CoT-20k为基础,通过系统性地移除原始数据中详细的推理步骤与中间过程,仅保留最终的答案或总结性结论而生成。这种构建方式旨在创建一个不含显式推理轨迹的对照数据集,用以研究在监督微调(SFT)过程中,数据结构和质量如何影响大语言模型的泛化能力与安全性表现。
特点
该数据集的核心特征在于其无思维链的结构设计,这为研究推理能力的来源提供了独特的视角。与包含详细推理步骤的数据相比,Math-NoCoT-20k促使模型学习直接输出答案,从而可能影响其跨领域泛化模式。研究揭示,使用此类数据训练的模型,其推理能力的提升相对有限,但在模型安全性方面往往表现出更好的保持性,呈现出一种不对称的泛化现象。
使用方法
在实践应用中,Math-NoCoT-20k主要用于对比实验,以评估不同数据格式对模型性能的影响。研究者可将其与包含完整思维链的Math-CoT-20k等数据集结合使用,通过控制变量法,系统分析优化动态、数据质量与模型能力之间的相互作用。该数据集可直接从Hugging Face或ModelScope平台加载,并应用于大语言模型的监督微调流程中,以探索推理泛化与安全性的平衡。
背景与挑战
背景概述
在大型语言模型推理能力优化的研究脉络中,Math-NoCoT-20k数据集作为一项关键对照资源应运而生。该数据集隶属于2026年发表的论文《Rethinking Generalization in Reasoning SFT》所构建的实验体系,由Qihan Ren等研究人员共同创建。其核心研究问题聚焦于探究在监督微调过程中,训练数据的结构特征——特别是思维链的缺失——如何影响模型在跨领域推理任务中的泛化能力。通过系统性地移除Math-CoT-20k数据集中详尽的推理步骤,仅保留最终答案,该数据集为剖析数据质量与结构对模型内部化可迁移推理模式的作用机制提供了精准的对照基准,深化了学界对推理微调中数据效率与泛化边界关系的理解。
当前挑战
该数据集旨在解决的领域挑战,在于揭示无思维链数据在提升模型推理泛化能力方面的内在局限与潜在风险。研究发现,相较于包含长思维链的数据,No-CoT数据虽能带来更好的安全性结果,但其促成的推理能力提升较为有限,且可能导致模型在跨领域泛化中接收到误导性信号。在构建过程中,挑战主要源于如何确保对照实验的严谨性:需从源数据中精确剥离思维链痕迹,同时完整保留问题与最终答案的对应关系,以构建纯净的“答案对”集合。此外,数据规模的匹配与质量验证也至关重要,需确保其与配套的CoT数据集在问题分布和难度上完全一致,方能进行有效的因果推断。
常用场景
经典使用场景
在大型语言模型推理能力微调的研究中,Math-NoCoT-20k数据集常被用作对照实验的关键材料。该数据集移除了数学推理任务中的思维链步骤,仅保留最终答案,为探究数据结构和质量对模型泛化能力的影响提供了纯净的基准。研究者通过将其与包含完整思维链的数据集进行对比,能够精确评估逐步推理过程在监督微调中的作用,从而揭示模型内部知识表征的形成机制。
解决学术问题
该数据集有效解决了推理微调领域关于数据质量与结构影响的学术争议。通过提供无思维链的数学问题样本,它帮助研究者分离了答案记忆与过程学习两种模式,澄清了低质量数据可能导致泛化假象的问题。其意义在于推动了对于监督微调中优化动态、模型能力与数据属性交互关系的系统性理解,为构建更稳健的推理模型奠定了实证基础。
衍生相关工作
基于该数据集的研究催生了多项关于不对称泛化现象的深入探索。相关工作进一步分析了不同模型规模下无思维链数据的训练动力学,并拓展到代码生成、逻辑推理等跨领域任务。这些研究构建了数据配置、模型能力与泛化性能之间的理论框架,启发了后续关于高效微调策略与多目标优化的算法创新。
以上内容由遇见数据集搜集并总结生成



