OmniAI-ZJU/NuminaMath-Cot-Distillation-100K

Name: OmniAI-ZJU/NuminaMath-Cot-Distillation-100K
Creator: OmniAI-ZJU
Published: 2026-04-20 13:41:38
License: 暂无描述

Hugging Face2026-04-20 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/OmniAI-ZJU/NuminaMath-Cot-Distillation-100K

下载链接

链接失效反馈

官方服务：

资源简介：

# NuminaMath-Cot-Distillation-100K: A Distilled Reasoning Dataset for Group Fine-Tuning ## 💡 Dataset Summary **NuminaMath-Cot-Distillation-100K** is a high-quality instruction-tuning dataset specifically designed for mathematical reasoning tasks. It is the official dataset released for the paper **"GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification,"** which has been accepted to **ACL 2026 Findings**. This dataset is engineered to address the "single-path dependency" and "entropy collapse" issues inherent in standard Supervised Fine-Tuning (SFT) by providing diverse reasoning trajectories for each mathematical problem.It serves as the foundational data for training models within the **Group Fine-Tuning (GFT)** framework. ## 📚 Dataset Lineage This dataset is a secondary development and enhancement of the open-source [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) dataset.NuminaMath-CoT provides a wide range of mathematical challenges, from high school exercises to international olympiad-level problems. ## 🛠️ Construction Method Following the methodology described in the GFT paper, we applied the following modifications to create this 100K version: * **Data Sampling**: We randomly sampled **100,000 (100K)** unique mathematical problems from the original NuminaMath-CoT corpus. * **Expert Trace Preservation**: For each problem, we retained the original chain-of-thought (CoT) reasoning path to serve as the "Expert Demonstration" ($y_{exp}$). * **Multi-Path Teacher Distillation**: * **Teacher Model**: We utilized **Qwen-2.5-Math-72B** as the powerful teacher model to introduce diverse reasoning paradigms. * **Response Generation**: For every problem in the 100K subset, we generated **8 distilled responses** using the teacher model. * **Rationale**: These diverse teacher outputs ($y_{demo}$) are integrated into a hybrid response group to break single-path dependency and provide comparative signals for advantage-based learning. ## 🚀 Usage & Purpose This dataset is optimized for: * **GFT Training**: Supporting the construction of hybrid response groups to perform Group Advantage Learning (GAL) and Dynamic Coefficient Rectification (DCR). * **RL Alignment**: Providing a superior cold-start initialization for subsequent reinforcement learning (e.g., GRPO), raising the attainable performance ceiling. * **Diversity Analysis**: Enabling researchers to analyze solution coverage and solution variety in mathematical reasoning. ## 📖 Citation If you use this dataset in your research, please cite our ACL paper: ```bibtex @article{gan2026gft, title={GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification}, author={Gan, Wangjie and Pan, Miao and Xi, Linbo and Zhang, Wenqi and Chen, Jintao and Yin, Jianwei and Zhang, Xuhong}, journal={arXiv preprint arXiv:2604.14258}, year={2026} }

提供机构：

OmniAI-ZJU

5,000+

优质数据集

54 个

任务类型

进入经典数据集