miromind-ai/MiroMind-M1-SFT-719K
收藏Hugging Face2025-07-22 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/miromind-ai/MiroMind-M1-SFT-719K
下载链接
链接失效反馈官方服务:
资源简介:
MiroMind-M1是一个完全开源的推理语言模型系列,基于Qwen-2.5构建,专注于提升数学推理能力。它通过在719K个精选问题上的监督微调(SFT)和在62K个具有挑战性的示例上的可验证奖励强化学习(RLVR)进行训练,并采用上下文感知的多阶段策略优化方法(CAMPO)。该模型在AIME24、AIME25和MATH500任务上取得了7B Qwen-2.5基模型中的最佳性能。
MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (SFT) on 719K curated problems and reinforcement learning with verifiable rewards (RLVR) on 62K challenging examples, using a context-aware multi-stage policy optimization method (CAMPO). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500.
提供机构:
miromind-ai



