MiroMind-M1-SFT-719K

Name: MiroMind-M1-SFT-719K
Creator: maas
Published: 2026-05-07 11:16:31
License: 暂无描述

魔搭社区2026-05-07 更新2025-08-16 收录

下载链接：

https://modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集转载自 huggingface 【[miromind-ai](https://huggingface.co/miromind-ai)】 #### 📖 关于项目相关的研究，可阅读公众号“觉察流”文章👇</br> 《[MiroMind-M1：如何用CAMPO算法打造高效且可复现的全栈开源推理模型](https://mp.weixin.qq.com/s/REPzzgsUjDMikg4jIo9KRg)》 #### _本仓库作者在此 👇🏻 扫一扫_ <img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" /> --- 数据集文件元信息以及数据文件，请浏览“数据集文件”页面获取。您可以通过如下GIT Clone命令，或者ModelScope SDK来下载数据集 #### 下载方法 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"} # 官方简介    <div align="center"> <img src="assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" /> </div>  <div align="center"> [![Models](https://img.shields.io/badge/Models-5EDDD2?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://www.modelscope.cn/models/okwidns/MiroMind-M1-RL-7B) [![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K) [![Paper](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2507.14683) [![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAsia/MiroMind-M1) [![Website](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://miromind.ai/) </div> # MiroMind-M1 ## 🧾 Overview <div align="center"> <img src="assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" /> <p><i>Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.</i></p> </div> **MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released. ## 📊 Evaluation ### MiroMind-M1-SFT | Model | Initial Checkpoint | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) | |------------------|----------------------------|--------|--------|---------| | DeepSeek-R1-Distill | Qwen2.5-Math-7B | 55.5 | 40.4† | 92.8 | | OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 | | Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 | | Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 | | **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 | *† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.* ### MiroMind-M1-RL | Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) | |----------------------------------|--------|--------|---------| | DeepSeek-R1 | 79.8 | 70.0 | – | | DeepSeek-R1-0528 | 91.4 | 87.5 | – | | Qwen3-8B | 76.0 | 67.3 | – | | DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – | | <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> | | DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 | | Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 | | **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 | | <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – | | **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 | | Light-R1-7B-DS | 59.1 | 44.3 | – | | Skywork-OR1-7B | 72.2 | 54.6 | – | | **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 | ## 🔗 Resources ### Models [`MiroMind-M1-SFT-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-SFT-7B)<br> [`MiroMind-M1-RL-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)<br> [`MiroMind-M1-RL-32B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-32B)<br> ### Data [`MiroMind-M1-SFT-719K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K)<br> [`MiroMind-M1-RL-62K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)<br>

本数据集转载自Hugging Face平台的【miromind-ai】账号（https://huggingface.co/miromind-ai） #### 📖 关于项目相关的研究，可阅读公众号「觉察流」发布的文章👇 《MiroMind-M1：如何通过CAMPO（Context-Aware Multi-Stage Policy Optimization，上下文感知多阶段策略优化）算法打造高效且可复现的全栈开源推理模型》（https://mp.weixin.qq.com/s/REPzzgsUjDMikg4jIo9KRg） #### _本仓库作者在此 👇🏻 扫码关注_ <img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" /> --- 数据集文件的元信息与实体数据，请前往「数据集文件」页面获取。您可通过以下Git Clone命令，或使用ModelScope软件开发工具包（SDK）下载本数据集 #### 下载方法 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"} # 官方简介    <div align="center"> <img src="assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" /> </div>  <div align="center"> [![模型](https://img.shields.io/badge/Models-5EDDD2?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://www.modelscope.cn/models/okwidns/MiroMind-M1-RL-7B) [![数据集](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K) [![论文](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2507.14683) [![代码](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAsia/MiroMind-M1) [![官网](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://miromind.ai/) </div> # MiroMind-M1 ## 🧾 项目概览 <div align="center"> <img src="assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" /> <p><i>MiroMind-M1-RL-7B在AIME24与AIME25数据集上的训练性能表现。</i></p> </div> **MiroMind-M1** 是基于通义千问2.5（Qwen-2.5）构建的全开源推理语言模型系列，专注于数学推理能力的进阶提升。该系列模型通过在71.9万条精选问题上开展监督微调（Supervised Fine-Tuning，SFT），并结合上下文感知多阶段策略优化方法（Context-Aware Multi-Stage Policy Optimization，CAMPO），在6.2万道高难度样本上进行了带可验证奖励的强化学习（Reinforcement Learning with Verifiable Rewards，RLVR）训练。MiroMind-M1在基于Qwen-2.5的开源7B模型中，于AIME24、AIME25及MATH500数据集上达到当前最优性能；所有模型（`MiroMind-M1-SFT-7B`、`MiroMind-M1-RL-7B`、`MiroMind-M1-RL-32B`）、数据集（`MiroMind-M1-SFT-719K`、`MiroMind-M1-RL-62K`）及训练配置均已开源发布。 ## 📊 评测结果 ### MiroMind-M1-SFT | 模型名称 | 初始 Checkpoint | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) | |------------------|----------------------------|--------|--------|---------| | DeepSeek-R1-Distill | Qwen2.5-Math-7B | 55.5 | 40.4† | 92.8 | | OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 | | Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 | | Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 | | **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 | *† 表示DeepSeek-R1在AIME25上的得分来自本团队的评测结果。 ### MiroMind-M1-RL | 模型名称 | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) | |----------------------------------|--------|--------|---------| | DeepSeek-R1 | 79.8 | 70.0 | – | | DeepSeek-R1-0528 | 91.4 | 87.5 | – | | Qwen3-8B | 76.0 | 67.3 | – | | DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – | | <tr><td colspan="4" align="center"><em>**基于Qwen2.5系列训练的32B模型**</em></td></tr> | | DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 | | Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 | | **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 | | <tr><td colspan="4" align="center"><em>**基于Qwen2.5系列训练的7B模型**</em></td></tr> | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – | | **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 | | Light-R1-7B-DS | 59.1 | 44.3 | – | | Skywork-OR1-7B | 72.2 | 54.6 | – | | **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 | ## 🔗 相关资源 ### 模型 [`MiroMind-M1-SFT-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-SFT-7B)<br> [`MiroMind-M1-RL-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)<br> [`MiroMind-M1-RL-32B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-32B)<br> ### 数据集 [`MiroMind-M1-SFT-719K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K)<br> [`MiroMind-M1-RL-62K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)<br>

提供机构：

maas

创建时间：

2025-08-10

搜集汇总

数据集介绍