MiroMind-M1-SFT-719K
收藏魔搭社区2026-05-07 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K
下载链接
链接失效反馈官方服务:
资源简介:
本数据集转载自 huggingface 【[miromind-ai](https://huggingface.co/miromind-ai)】
#### 📖 关于项目相关的研究,可阅读公众号“觉察流”文章👇</br>
《[MiroMind-M1:如何用CAMPO算法打造高效且可复现的全栈开源推理模型](https://mp.weixin.qq.com/s/REPzzgsUjDMikg4jIo9KRg)》
#### _本仓库作者在此 👇🏻 扫一扫_
<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
---
数据集文件元信息以及数据文件,请浏览“数据集文件”页面获取。
您可以通过如下GIT Clone命令,或者ModelScope SDK来下载数据集
#### 下载方法
:modelscope-code[]{type="sdk"}
:modelscope-code[]{type="git"}
# 官方简介
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" />
</div>
<!-- <hr> -->
<div align="center">
[](https://www.modelscope.cn/models/okwidns/MiroMind-M1-RL-7B)
[](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)
[](https://arxiv.org/abs/2507.14683)
[](https://github.com/MiroMindAsia/MiroMind-M1)
[](https://miromind.ai/)
</div>
# MiroMind-M1
## 🧾 Overview
<div align="center">
<img src="assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" />
<p><i>Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.</i></p>
</div>
**MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.
## 📊 Evaluation
### MiroMind-M1-SFT
| Model | Initial Checkpoint | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|------------------|----------------------------|--------|--------|---------|
| DeepSeek-R1-Distill | Qwen2.5-Math-7B | 55.5 | 40.4† | 92.8 |
| OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 |
| Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 |
| Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 |
| **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 |
*† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*
### MiroMind-M1-RL
| Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|----------------------------------|--------|--------|---------|
| DeepSeek-R1 | 79.8 | 70.0 | – |
| DeepSeek-R1-0528 | 91.4 | 87.5 | – |
| Qwen3-8B | 76.0 | 67.3 | – |
| DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |
| <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |
| Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |
| **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 |
| <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – |
| **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 |
| Light-R1-7B-DS | 59.1 | 44.3 | – |
| Skywork-OR1-7B | 72.2 | 54.6 | – |
| **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
## 🔗 Resources
### Models
[`MiroMind-M1-SFT-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-SFT-7B)<br>
[`MiroMind-M1-RL-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)<br>
[`MiroMind-M1-RL-32B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-32B)<br>
### Data
[`MiroMind-M1-SFT-719K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K)<br>
[`MiroMind-M1-RL-62K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)<br>
本数据集转载自Hugging Face平台的【miromind-ai】账号(https://huggingface.co/miromind-ai)
#### 📖 关于项目相关的研究,可阅读公众号「觉察流」发布的文章👇
《MiroMind-M1:如何通过CAMPO(Context-Aware Multi-Stage Policy Optimization,上下文感知多阶段策略优化)算法打造高效且可复现的全栈开源推理模型》(https://mp.weixin.qq.com/s/REPzzgsUjDMikg4jIo9KRg)
#### _本仓库作者在此 👇🏻 扫码关注_
<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
---
数据集文件的元信息与实体数据,请前往「数据集文件」页面获取。
您可通过以下Git Clone命令,或使用ModelScope软件开发工具包(SDK)下载本数据集
#### 下载方法
:modelscope-code[]{type="sdk"}
:modelscope-code[]{type="git"}
# 官方简介
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" />
</div>
<!-- <hr> -->
<div align="center">
[](https://www.modelscope.cn/models/okwidns/MiroMind-M1-RL-7B)
[](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)
[](https://arxiv.org/abs/2507.14683)
[](https://github.com/MiroMindAsia/MiroMind-M1)
[](https://miromind.ai/)
</div>
# MiroMind-M1
## 🧾 项目概览
<div align="center">
<img src="assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" />
<p><i>MiroMind-M1-RL-7B在AIME24与AIME25数据集上的训练性能表现。</i></p>
</div>
**MiroMind-M1** 是基于通义千问2.5(Qwen-2.5)构建的全开源推理语言模型系列,专注于数学推理能力的进阶提升。该系列模型通过在71.9万条精选问题上开展监督微调(Supervised Fine-Tuning,SFT),并结合上下文感知多阶段策略优化方法(Context-Aware Multi-Stage Policy Optimization,CAMPO),在6.2万道高难度样本上进行了带可验证奖励的强化学习(Reinforcement Learning with Verifiable Rewards,RLVR)训练。MiroMind-M1在基于Qwen-2.5的开源7B模型中,于AIME24、AIME25及MATH500数据集上达到当前最优性能;所有模型(`MiroMind-M1-SFT-7B`、`MiroMind-M1-RL-7B`、`MiroMind-M1-RL-32B`)、数据集(`MiroMind-M1-SFT-719K`、`MiroMind-M1-RL-62K`)及训练配置均已开源发布。
## 📊 评测结果
### MiroMind-M1-SFT
| 模型名称 | 初始 Checkpoint | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|------------------|----------------------------|--------|--------|---------|
| DeepSeek-R1-Distill | Qwen2.5-Math-7B | 55.5 | 40.4† | 92.8 |
| OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 |
| Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 |
| Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 |
| **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 |
*† 表示DeepSeek-R1在AIME25上的得分来自本团队的评测结果。
### MiroMind-M1-RL
| 模型名称 | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|----------------------------------|--------|--------|---------|
| DeepSeek-R1 | 79.8 | 70.0 | – |
| DeepSeek-R1-0528 | 91.4 | 87.5 | – |
| Qwen3-8B | 76.0 | 67.3 | – |
| DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |
| <tr><td colspan="4" align="center"><em>**基于Qwen2.5系列训练的32B模型**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |
| Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |
| **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 |
| <tr><td colspan="4" align="center"><em>**基于Qwen2.5系列训练的7B模型**</em></td></tr> |
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – |
| **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 |
| Light-R1-7B-DS | 59.1 | 44.3 | – |
| Skywork-OR1-7B | 72.2 | 54.6 | – |
| **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
## 🔗 相关资源
### 模型
[`MiroMind-M1-SFT-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-SFT-7B)<br>
[`MiroMind-M1-RL-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)<br>
[`MiroMind-M1-RL-32B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-32B)<br>
### 数据集
[`MiroMind-M1-SFT-719K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K)<br>
[`MiroMind-M1-RL-62K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)<br>
提供机构:
maas
创建时间:
2025-08-10
搜集汇总
数据集介绍

背景与挑战
背景概述
MiroMind-M1-SFT-719K是一个专注于数学推理的开源数据集,属于基于Qwen-2.5模型的MiroMind-M1系列。它包含719K个精选问题,通过监督微调(SFT)进行训练,旨在提升模型在AIME24、AIME25和MATH500等基准测试中的性能。
以上内容由遇见数据集搜集并总结生成



