miromind-ai/MiroMind-M1-RL-62K

Name: miromind-ai/MiroMind-M1-RL-62K
Creator: miromind-ai
Published: 2025-07-22 01:44:08
License: 暂无描述

Hugging Face2025-07-22 更新2025-08-09 收录

下载链接：

https://hf-mirror.com/datasets/miromind-ai/MiroMind-M1-RL-62K

下载链接

链接失效反馈

官方服务：

资源简介：

MiroMind-M1是一个完全开源的推理语言模型系列，基于Qwen-2.5构建，致力于推进数学推理能力。它通过监督微调(SFT)在719K个精选问题上进行训练，并通过可验证奖励的强化学习(RLVR)在62K个具有挑战性的示例上进行训练，使用了一种上下文感知的多阶段策略优化方法(CAMPO)。MiroMind-M1在AIME24、AIME25和MATH500指标上达到了开源7B Qwen-2.5基模型的领先性能，并公开了所有模型、数据和训练设置。

MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (SFT) on 719K curated problems and reinforcement learning with verifiable rewards (RLVR) on 62K challenging examples, using a context-aware multi-stage policy optimization method (CAMPO). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (MiroMind-M1-SFT-7B, MiroMind-M1-RL-7B, MiroMind-M1-RL-32B), data (MiroMind-M1-SFT-719K, MiroMind-M1-RL-62K), and training setups openly released.

提供机构：

miromind-ai

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集