miromind-ai/MiroMind-M1-RL-62K
收藏Hugging Face2025-07-22 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/miromind-ai/MiroMind-M1-RL-62K
下载链接
链接失效反馈官方服务:
资源简介:
MiroMind-M1是一个完全开源的推理语言模型系列,基于Qwen-2.5构建,致力于推进数学推理能力。它通过监督微调(SFT)在719K个精选问题上进行训练,并通过可验证奖励的强化学习(RLVR)在62K个具有挑战性的示例上进行训练,使用了一种上下文感知的多阶段策略优化方法(CAMPO)。MiroMind-M1在AIME24、AIME25和MATH500指标上达到了开源7B Qwen-2.5基模型的领先性能,并公开了所有模型、数据和训练设置。
MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (SFT) on 719K curated problems and reinforcement learning with verifiable rewards (RLVR) on 62K challenging examples, using a context-aware multi-stage policy optimization method (CAMPO). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (MiroMind-M1-SFT-7B, MiroMind-M1-RL-7B, MiroMind-M1-RL-32B), data (MiroMind-M1-SFT-719K, MiroMind-M1-RL-62K), and training setups openly released.
提供机构:
miromind-ai
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



