AReaL-RL-Data

Name: AReaL-RL-Data
Creator: maas
Published: 2026-01-03 14:36:40
License: 暂无描述

魔搭社区2026-01-03 更新2025-03-01 收录

下载链接：

https://modelscope.cn/datasets/inclusionAI/AReaL-RL-Data

下载链接

链接失效反馈

官方服务：

资源简介：

# AReaL: Ant Reasoning RL **A fully open-sourced and inclusive RL project for large reasoning models** AReaL (Ant Reasoning RL) is an open-source and efficient reinforcement learning system developed at **the RL Lab, Ant Research**. AReaL inherits and adapts the Open-Source Project [ReaLHF](https://github.com/openpsi-project/ReaLHF) for training Large Reasoning Models (LRMs) that everyone can reproduce and contribute to. AReaL is part of our efforts from Ant Research to develop tools and systems for a fully open and inclusive AGI world. **AReaL Highlights** - 🛠️ **Open & Reproducible**: We will continuously release *all code, datasets, and training recipes* for training LRMs --- no hidden secrects or proprietary barriers. - 🚀 **Scalable Performance**: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to hundreds of GPUs. - 🌍 **Community-Driven AGI**: With a fully open-source commitment, we hope our efforts can benefit the entire community to accelerate AGI research. **Github URL**: https://github.com/inclusionAI/AReaL --- # Content We release our training dataset in this repository. The RL training dataset consists of 40k high-quality mathematical reasoning tasks released by [DeepScaleR](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset). We are also actively developing better datasets suitable for training stronger and larger models in future releases. + `data/id2info.json`: The solutions to each question indexed by query ID. Used for computing rewards during training. + `prompts_for_r1_distilled.jsonl`: The dataset for training the [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model. + `prompts_for_zero.jsonl`: The dataset for training the R1-Zero-style model from [`Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B). To reproduce our results, please refer to the [step-by-step guideline](https://github.com/inclusionAI/AReaL/examples/) on GitHub. ```bash # Download the dataset DATA_PATH=/storage/datasets/ cd $DATA_PATH wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/prompts_for_r1_distilled.jsonl wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/id2info.json # Training in a Ray cluster with 16 nodes # stage 1 MODEL_PATH=${path_to_DeepSeek-R1-Distill-Qwen-1.5B} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 8192 # stage 2 MODEL_PATH=${model_path_from_stage_1} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 16384 # stage 3 MODEL_PATH=${model_path_from_stage_2} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 24000 ```

# AReaL：蚂蚁推理强化学习（Ant Reasoning RL） **面向大推理模型的全开源包容型强化学习项目** AReaL（蚂蚁推理强化学习，Ant Reasoning RL）是由蚂蚁研究院强化学习实验室开发的开源高效强化学习系统。AReaL 继承并适配了开源项目 [ReaLHF](https://github.com/openpsi-project/ReaLHF)，用于训练大推理模型（Large Reasoning Models, LRMs），所有开发者均可复现并参与贡献。AReaL 是蚂蚁研究院为构建全面开放、包容的通用人工智能（AGI）世界而开发的工具与系统之一。 **AReaL 核心亮点** - 🛠️ **开放可复现**：我们将持续发布用于训练大推理模型的全部代码、数据集与训练流程——无任何隐藏细节或专有壁垒。 - 🚀 **可扩展性能**：AReaL 可无缝适配不同算力资源配置，从单节点到数百块GPU均可兼容。 - 🌍 **社区驱动的通用人工智能**：秉持全面开源的理念，我们希望本项目能够惠及整个社区，加速通用人工智能的研究进程。 **GitHub 仓库地址**：https://github.com/inclusionAI/AReaL --- # 数据集内容本仓库发布了我们的训练数据集。本次发布的强化学习训练数据集包含4万个高质量数学推理任务，数据源自 [DeepScaleR](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) 公开数据集。我们也在积极开发更适配更强、更大模型训练的数据集，将在后续版本中发布。 + `data/id2info.json`: 以查询ID为索引的各问题解题方案，用于训练过程中的奖励计算。 + `prompts_for_r1_distilled.jsonl`: 用于训练 [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) 模型的数据集。 + `prompts_for_zero.jsonl`: 用于基于 [`Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B) 训练R1-Zero风格模型的数据集。如需复现我们的实验结果，请参阅 GitHub 仓库中的 [分步指南](https://github.com/inclusionAI/AReaL/examples/)。 bash # 下载数据集 DATA_PATH=/storage/datasets/ cd $DATA_PATH wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/prompts_for_r1_distilled.jsonl wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/id2info.json # 在16节点的Ray集群中开展训练 # 第一阶段 MODEL_PATH=${DeepSeek-R1-Distill-Qwen-1.5B模型路径} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 8192 # 第二阶段 MODEL_PATH=${第一阶段训练得到的模型路径} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 16384 # 第三阶段 MODEL_PATH=${第二阶段训练得到的模型路径} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 24000

提供机构：

maas

创建时间：

2025-02-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集