five

AM-Thinking-v1-RL-Dataset

收藏
魔搭社区2026-05-23 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/a-m-team/AM-Thinking-v1-RL-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
<p align="center"> 🤗 <a href="https://huggingface.co/collections/a-m-team/am-thinking-v1-682b2247b8143767802056a6">AM-Thinking-v1 Collections</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2505.08311"> Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://a-m-team.github.io/am-thinking-v1/">Blog</a> &nbsp&nbsp </p> ## 🚀 Introduction We release the **Math & Code RL training dataset** used to build [AM-Thinking-v1](https://huggingface.co/a-m-team/AM-Thinking-v1), a 32B dense language model designed for high-level reasoning. AM-Thinking-v1 is built on top of **Qwen 2.5-32B-Base**, and demonstrates strong performance in math and code reasoning tasks, rivaling much larger models like **Qwen3‑235B‑A22B** and **Seed1.5-Thinking**, while being deployable on a single A100 (80GB). <div style="text-align: center;"> <img src="assets/benchmark.png" alt="benchmark" style="width: 90%;"> </div> ## 📦 Dataset Overview This dataset is used for **reinforcement learning (RL)** training and includes: * **Math** and **code** queries with ground truth or test-case verification. * Format follows the [verl](https://github.com/volcengine/verl) standard and is ready for direct use. Each example has the following fields: ```json { "data_source": "...", "prompt": [ { "role": "system", "content": "..." }, { "role": "user", "content": "..." } ], "ability": "math", "reward_model": { "ground_truth": "...", "style": "rule" }, "extra_info": { "index": 0, "split": "train" }, } ``` #### Field Descriptions: * **data\_source**: Name of the dataset source (e.g., *NuminaMath*). * **prompt**: typically includes a system instruction and a user query. * **ability**: Indicates the skill domain required (e.g., *math* or *code*). * **reward\_model**: Contains the math ground truth or code test cases, as well as the evaluation rule type. * **extra\_info**: Includes metadata such as sample index and dataset split. ## 📚 Citation If you use this dataset or model in your work, please cite: ``` @misc{ji2025amthinkingv1advancingfrontierreasoning, title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale}, author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li}, year={2025}, eprint={2505.08311}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.08311}, } ```

<p align="center"> 🤗 <a href="https://huggingface.co/collections/a-m-team/am-thinking-v1-682b2247b8143767802056a6">AM-Thinking-v1 数据集集合</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2505.08311"> 论文</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://a-m-team.github.io/am-thinking-v1/">博客</a> &nbsp&nbsp </p> ## 🚀 简介 我们发布了用于构建[AM-Thinking-v1](https://huggingface.co/a-m-team/AM-Thinking-v1)的**数学与代码强化学习(Reinforcement Learning, RL)训练数据集**,该模型是一款面向高阶推理的320亿参数稠密大语言模型(Large Language Model, LLM)。AM-Thinking-v1基于**Qwen 2.5-32B-Base**构建,在数学与代码推理任务上展现出卓越性能,可与Qwen3‑235B‑A22B、Seed1.5-Thinking等超大规模模型比肩,且仅需单张80GB显存的A100显卡即可完成部署。 <div style="text-align: center;"> <img src="assets/benchmark.png" alt="基准测试结果" style="width: 90%;"> </div> ## 📦 数据集概览 本数据集用于**强化学习(Reinforcement Learning, RL)**训练,包含以下内容: * **数学**与**代码**查询任务,附带标准答案或测试用例验证。 * 数据集格式遵循[verl](https://github.com/volcengine/verl)标准,可直接投入使用。 每个样本包含以下字段: json { "data_source": "...", "prompt": [ { "role": "system", "content": "..." }, { "role": "user", "content": "..." } ], "ability": "math", "reward_model": { "ground_truth": "...", "style": "rule" }, "extra_info": { "index": 0, "split": "train" }, } #### 字段说明 * **data_source**:数据集来源名称(例如 *NuminaMath*)。 * **prompt**:通常包含系统提示与用户查询。 * **ability**:标注所需的技能领域(例如 *math* 或 *code*)。 * **reward_model**:包含数学标准答案或代码测试用例,以及评估规则类型。 * **extra_info**:包含样本索引、数据集划分等元数据。 ## 📚 引用说明 若您在研究工作中使用本数据集或模型,请引用以下文献: @misc{ji2025amthinkingv1advancingfrontierreasoning, title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale}, author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li}, year={2025}, eprint={2505.08311}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.08311}, }
提供机构:
maas
创建时间:
2025-05-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作