five

OmniAI-ZJU/NuminaMath-Cot-Distillation-100K

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/OmniAI-ZJU/NuminaMath-Cot-Distillation-100K
下载链接
链接失效反馈
官方服务:
资源简介:
# NuminaMath-Cot-Distillation-100K: A Distilled Reasoning Dataset for Group Fine-Tuning ## 💡 Dataset Summary **NuminaMath-Cot-Distillation-100K** is a high-quality instruction-tuning dataset specifically designed for mathematical reasoning tasks. It is the official dataset released for the paper **"GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification,"** which has been accepted to **ACL 2026 Findings**. This dataset is engineered to address the "single-path dependency" and "entropy collapse" issues inherent in standard Supervised Fine-Tuning (SFT) by providing diverse reasoning trajectories for each mathematical problem.It serves as the foundational data for training models within the **Group Fine-Tuning (GFT)** framework. ## 📚 Dataset Lineage This dataset is a secondary development and enhancement of the open-source [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) dataset.NuminaMath-CoT provides a wide range of mathematical challenges, from high school exercises to international olympiad-level problems. ## 🛠️ Construction Method Following the methodology described in the GFT paper, we applied the following modifications to create this 100K version: * **Data Sampling**: We randomly sampled **100,000 (100K)** unique mathematical problems from the original NuminaMath-CoT corpus. * **Expert Trace Preservation**: For each problem, we retained the original chain-of-thought (CoT) reasoning path to serve as the "Expert Demonstration" ($y_{exp}$). * **Multi-Path Teacher Distillation**: * **Teacher Model**: We utilized **Qwen-2.5-Math-72B** as the powerful teacher model to introduce diverse reasoning paradigms. * **Response Generation**: For every problem in the 100K subset, we generated **8 distilled responses** using the teacher model. * **Rationale**: These diverse teacher outputs ($y_{demo}$) are integrated into a hybrid response group to break single-path dependency and provide comparative signals for advantage-based learning. ## 🚀 Usage & Purpose This dataset is optimized for: * **GFT Training**: Supporting the construction of hybrid response groups to perform Group Advantage Learning (GAL) and Dynamic Coefficient Rectification (DCR). * **RL Alignment**: Providing a superior cold-start initialization for subsequent reinforcement learning (e.g., GRPO), raising the attainable performance ceiling. * **Diversity Analysis**: Enabling researchers to analyze solution coverage and solution variety in mathematical reasoning. ## 📖 Citation If you use this dataset in your research, please cite our ACL paper: ```bibtex @article{gan2026gft, title={GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification}, author={Gan, Wangjie and Pan, Miao and Xi, Linbo and Zhang, Wenqi and Chen, Jintao and Yin, Jianwei and Zhang, Xuhong}, journal={arXiv preprint arXiv:2604.14258}, year={2026} }
提供机构:
OmniAI-ZJU
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作