AReaL-RL-Data
收藏魔搭社区2026-01-03 更新2025-03-01 收录
下载链接:
https://modelscope.cn/datasets/inclusionAI/AReaL-RL-Data
下载链接
链接失效反馈官方服务:
资源简介:
# AReaL: Ant Reasoning RL
**A fully open-sourced and inclusive RL project for large reasoning models**
AReaL (Ant Reasoning RL) is an open-source and efficient reinforcement learning system developed at **the RL Lab, Ant Research**. AReaL inherits and adapts the Open-Source Project [ReaLHF](https://github.com/openpsi-project/ReaLHF) for training Large Reasoning Models (LRMs) that everyone can reproduce and contribute to. AReaL is part of our efforts from Ant Research to develop tools and systems for a fully open and inclusive AGI world.
**AReaL Highlights**
- 🛠️ **Open & Reproducible**: We will continuously release *all code, datasets, and training recipes* for training LRMs --- no hidden secrects or proprietary barriers.
- 🚀 **Scalable Performance**: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to hundreds of GPUs.
- 🌍 **Community-Driven AGI**: With a fully open-source commitment, we hope our efforts can benefit the entire community to accelerate AGI research.
**Github URL**: https://github.com/inclusionAI/AReaL
---
# Content
We release our training dataset in this repository.
The RL training dataset consists of 40k high-quality mathematical reasoning tasks
released by [DeepScaleR](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset).
We are also actively developing better datasets suitable for training stronger and larger models in future releases.
+ `data/id2info.json`: The solutions to each question indexed by query ID. Used for computing rewards during training.
+ `prompts_for_r1_distilled.jsonl`: The dataset for training the [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model.
+ `prompts_for_zero.jsonl`: The dataset for training the R1-Zero-style model from [`Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B).
To reproduce our results, please refer to the [step-by-step guideline](https://github.com/inclusionAI/AReaL/examples/) on GitHub.
```bash
# Download the dataset
DATA_PATH=/storage/datasets/
cd $DATA_PATH
wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/prompts_for_r1_distilled.jsonl
wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/id2info.json
# Training in a Ray cluster with 16 nodes
# stage 1
MODEL_PATH=${path_to_DeepSeek-R1-Distill-Qwen-1.5B}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 8192
# stage 2
MODEL_PATH=${model_path_from_stage_1}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 16384
# stage 3
MODEL_PATH=${model_path_from_stage_2}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 24000
```
# AReaL:蚂蚁推理强化学习(Ant Reasoning RL)
**面向大推理模型的全开源包容型强化学习项目**
AReaL(蚂蚁推理强化学习,Ant Reasoning RL)是由蚂蚁研究院强化学习实验室开发的开源高效强化学习系统。AReaL 继承并适配了开源项目 [ReaLHF](https://github.com/openpsi-project/ReaLHF),用于训练大推理模型(Large Reasoning Models, LRMs),所有开发者均可复现并参与贡献。AReaL 是蚂蚁研究院为构建全面开放、包容的通用人工智能(AGI)世界而开发的工具与系统之一。
**AReaL 核心亮点**
- 🛠️ **开放可复现**:我们将持续发布用于训练大推理模型的全部代码、数据集与训练流程——无任何隐藏细节或专有壁垒。
- 🚀 **可扩展性能**:AReaL 可无缝适配不同算力资源配置,从单节点到数百块GPU均可兼容。
- 🌍 **社区驱动的通用人工智能**:秉持全面开源的理念,我们希望本项目能够惠及整个社区,加速通用人工智能的研究进程。
**GitHub 仓库地址**:https://github.com/inclusionAI/AReaL
---
# 数据集内容
本仓库发布了我们的训练数据集。本次发布的强化学习训练数据集包含4万个高质量数学推理任务,数据源自 [DeepScaleR](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) 公开数据集。我们也在积极开发更适配更强、更大模型训练的数据集,将在后续版本中发布。
+ `data/id2info.json`: 以查询ID为索引的各问题解题方案,用于训练过程中的奖励计算。
+ `prompts_for_r1_distilled.jsonl`: 用于训练 [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) 模型的数据集。
+ `prompts_for_zero.jsonl`: 用于基于 [`Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B) 训练R1-Zero风格模型的数据集。
如需复现我们的实验结果,请参阅 GitHub 仓库中的 [分步指南](https://github.com/inclusionAI/AReaL/examples/)。
bash
# 下载数据集
DATA_PATH=/storage/datasets/
cd $DATA_PATH
wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/prompts_for_r1_distilled.jsonl
wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/id2info.json
# 在16节点的Ray集群中开展训练
# 第一阶段
MODEL_PATH=${DeepSeek-R1-Distill-Qwen-1.5B模型路径}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 8192
# 第二阶段
MODEL_PATH=${第一阶段训练得到的模型路径}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 16384
# 第三阶段
MODEL_PATH=${第二阶段训练得到的模型路径}
bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 24000
提供机构:
maas
创建时间:
2025-02-24



