AM-Thinking-v1-RL-Dataset

Name: AM-Thinking-v1-RL-Dataset
Creator: maas
Published: 2026-05-23 20:52:15
License: 暂无描述

魔搭社区2026-05-23 更新2025-05-24 收录

下载链接：

https://modelscope.cn/datasets/a-m-team/AM-Thinking-v1-RL-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

<p align="center"> 🤗 <a href="https://huggingface.co/collections/a-m-team/am-thinking-v1-682b2247b8143767802056a6">AM-Thinking-v1 Collections</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2505.08311"> Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://a-m-team.github.io/am-thinking-v1/">Blog</a> &nbsp&nbsp </p> ## 🚀 Introduction We release the **Math & Code RL training dataset** used to build [AM-Thinking-v1](https://huggingface.co/a-m-team/AM-Thinking-v1), a 32B dense language model designed for high-level reasoning. AM-Thinking-v1 is built on top of **Qwen 2.5-32B-Base**, and demonstrates strong performance in math and code reasoning tasks, rivaling much larger models like **Qwen3‑235B‑A22B** and **Seed1.5-Thinking**, while being deployable on a single A100 (80GB). <div style="text-align: center;"> <img src="assets/benchmark.png" alt="benchmark" style="width: 90%;"> </div> ## 📦 Dataset Overview This dataset is used for **reinforcement learning (RL)** training and includes: * **Math** and **code** queries with ground truth or test-case verification. * Format follows the [verl](https://github.com/volcengine/verl) standard and is ready for direct use. Each example has the following fields: ```json { "data_source": "...", "prompt": [ { "role": "system", "content": "..." }, { "role": "user", "content": "..." } ], "ability": "math", "reward_model": { "ground_truth": "...", "style": "rule" }, "extra_info": { "index": 0, "split": "train" }, } ``` #### Field Descriptions: * **data\_source**: Name of the dataset source (e.g., *NuminaMath*). * **prompt**: typically includes a system instruction and a user query. * **ability**: Indicates the skill domain required (e.g., *math* or *code*). * **reward\_model**: Contains the math ground truth or code test cases, as well as the evaluation rule type. * **extra\_info**: Includes metadata such as sample index and dataset split. ## 📚 Citation If you use this dataset or model in your work, please cite: ``` @misc{ji2025amthinkingv1advancingfrontierreasoning, title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale}, author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li}, year={2025}, eprint={2505.08311}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.08311}, } ```

<p align="center"> 🤗 <a href="https://huggingface.co/collections/a-m-team/am-thinking-v1-682b2247b8143767802056a6">AM-Thinking-v1 数据集集合</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2505.08311"> 论文</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://a-m-team.github.io/am-thinking-v1/">博客</a> &nbsp&nbsp </p> ## 🚀 简介我们发布了用于构建[AM-Thinking-v1](https://huggingface.co/a-m-team/AM-Thinking-v1)的**数学与代码强化学习（Reinforcement Learning, RL）训练数据集**，该模型是一款面向高阶推理的320亿参数稠密大语言模型（Large Language Model, LLM）。AM-Thinking-v1基于**Qwen 2.5-32B-Base**构建，在数学与代码推理任务上展现出卓越性能，可与Qwen3‑235B‑A22B、Seed1.5-Thinking等超大规模模型比肩，且仅需单张80GB显存的A100显卡即可完成部署。 <div style="text-align: center;"> <img src="assets/benchmark.png" alt="基准测试结果" style="width: 90%;"> </div> ## 📦 数据集概览本数据集用于**强化学习（Reinforcement Learning, RL）**训练，包含以下内容： * **数学**与**代码**查询任务，附带标准答案或测试用例验证。 * 数据集格式遵循[verl](https://github.com/volcengine/verl)标准，可直接投入使用。每个样本包含以下字段： json { "data_source": "...", "prompt": [ { "role": "system", "content": "..." }, { "role": "user", "content": "..." } ], "ability": "math", "reward_model": { "ground_truth": "...", "style": "rule" }, "extra_info": { "index": 0, "split": "train" }, } #### 字段说明 * **data_source**：数据集来源名称（例如 *NuminaMath*）。 * **prompt**：通常包含系统提示与用户查询。 * **ability**：标注所需的技能领域（例如 *math* 或 *code*）。 * **reward_model**：包含数学标准答案或代码测试用例，以及评估规则类型。 * **extra_info**：包含样本索引、数据集划分等元数据。 ## 📚 引用说明若您在研究工作中使用本数据集或模型，请引用以下文献： @misc{ji2025amthinkingv1advancingfrontierreasoning, title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale}, author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li}, year={2025}, eprint={2505.08311}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.08311}, }

提供机构：

maas

创建时间：

2025-05-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集