AM-Thinking-v1-RL-Dataset
收藏魔搭社区2026-05-23 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/a-m-team/AM-Thinking-v1-RL-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
<p align="center">
🤗 <a href="https://huggingface.co/collections/a-m-team/am-thinking-v1-682b2247b8143767802056a6">AM-Thinking-v1 Collections</a>   |    📑 <a href="https://arxiv.org/abs/2505.08311"> Paper</a>    |    📑 <a href="https://a-m-team.github.io/am-thinking-v1/">Blog</a>   
</p>
## 🚀 Introduction
We release the **Math & Code RL training dataset** used to build [AM-Thinking-v1](https://huggingface.co/a-m-team/AM-Thinking-v1), a 32B dense language model designed for high-level reasoning.
AM-Thinking-v1 is built on top of **Qwen 2.5-32B-Base**, and demonstrates strong performance in math and code reasoning tasks, rivaling much larger models like **Qwen3‑235B‑A22B** and **Seed1.5-Thinking**, while being deployable on a single A100 (80GB).
<div style="text-align: center;">
<img src="assets/benchmark.png" alt="benchmark" style="width: 90%;">
</div>
## 📦 Dataset Overview
This dataset is used for **reinforcement learning (RL)** training and includes:
* **Math** and **code** queries with ground truth or test-case verification.
* Format follows the [verl](https://github.com/volcengine/verl) standard and is ready for direct use.
Each example has the following fields:
```json
{
"data_source": "...",
"prompt": [
{
"role": "system",
"content": "..."
},
{
"role": "user",
"content": "..."
}
],
"ability": "math",
"reward_model": {
"ground_truth": "...",
"style": "rule"
},
"extra_info": {
"index": 0,
"split": "train"
},
}
```
#### Field Descriptions:
* **data\_source**: Name of the dataset source (e.g., *NuminaMath*).
* **prompt**: typically includes a system instruction and a user query.
* **ability**: Indicates the skill domain required (e.g., *math* or *code*).
* **reward\_model**: Contains the math ground truth or code test cases, as well as the evaluation rule type.
* **extra\_info**: Includes metadata such as sample index and dataset split.
## 📚 Citation
If you use this dataset or model in your work, please cite:
```
@misc{ji2025amthinkingv1advancingfrontierreasoning,
title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale},
author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li},
year={2025},
eprint={2505.08311},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.08311},
}
```
<p align="center">
🤗 <a href="https://huggingface.co/collections/a-m-team/am-thinking-v1-682b2247b8143767802056a6">AM-Thinking-v1 数据集集合</a>   |    📑 <a href="https://arxiv.org/abs/2505.08311"> 论文</a>    |    📑 <a href="https://a-m-team.github.io/am-thinking-v1/">博客</a>   
</p>
## 🚀 简介
我们发布了用于构建[AM-Thinking-v1](https://huggingface.co/a-m-team/AM-Thinking-v1)的**数学与代码强化学习(Reinforcement Learning, RL)训练数据集**,该模型是一款面向高阶推理的320亿参数稠密大语言模型(Large Language Model, LLM)。AM-Thinking-v1基于**Qwen 2.5-32B-Base**构建,在数学与代码推理任务上展现出卓越性能,可与Qwen3‑235B‑A22B、Seed1.5-Thinking等超大规模模型比肩,且仅需单张80GB显存的A100显卡即可完成部署。
<div style="text-align: center;">
<img src="assets/benchmark.png" alt="基准测试结果" style="width: 90%;">
</div>
## 📦 数据集概览
本数据集用于**强化学习(Reinforcement Learning, RL)**训练,包含以下内容:
* **数学**与**代码**查询任务,附带标准答案或测试用例验证。
* 数据集格式遵循[verl](https://github.com/volcengine/verl)标准,可直接投入使用。
每个样本包含以下字段:
json
{
"data_source": "...",
"prompt": [
{
"role": "system",
"content": "..."
},
{
"role": "user",
"content": "..."
}
],
"ability": "math",
"reward_model": {
"ground_truth": "...",
"style": "rule"
},
"extra_info": {
"index": 0,
"split": "train"
},
}
#### 字段说明
* **data_source**:数据集来源名称(例如 *NuminaMath*)。
* **prompt**:通常包含系统提示与用户查询。
* **ability**:标注所需的技能领域(例如 *math* 或 *code*)。
* **reward_model**:包含数学标准答案或代码测试用例,以及评估规则类型。
* **extra_info**:包含样本索引、数据集划分等元数据。
## 📚 引用说明
若您在研究工作中使用本数据集或模型,请引用以下文献:
@misc{ji2025amthinkingv1advancingfrontierreasoning,
title={AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale},
author={Yunjie Ji and Xiaoyu Tian and Sitong Zhao and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li},
year={2025},
eprint={2505.08311},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.08311},
}
提供机构:
maas
创建时间:
2025-05-21



