five

AReaL-RL-Data

收藏
魔搭社区2026-01-03 更新2025-03-01 收录
下载链接:
https://modelscope.cn/datasets/inclusionAI/AReaL-RL-Data
下载链接
链接失效反馈
官方服务:
资源简介:
# AReaL: Ant Reasoning RL **A fully open-sourced and inclusive RL project for large reasoning models** AReaL (Ant Reasoning RL) is an open-source and efficient reinforcement learning system developed at **the RL Lab, Ant Research**. AReaL inherits and adapts the Open-Source Project [ReaLHF](https://github.com/openpsi-project/ReaLHF) for training Large Reasoning Models (LRMs) that everyone can reproduce and contribute to. AReaL is part of our efforts from Ant Research to develop tools and systems for a fully open and inclusive AGI world. **AReaL Highlights** - 🛠️ **Open & Reproducible**: We will continuously release *all code, datasets, and training recipes* for training LRMs --- no hidden secrects or proprietary barriers. - 🚀 **Scalable Performance**: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to hundreds of GPUs. - 🌍 **Community-Driven AGI**: With a fully open-source commitment, we hope our efforts can benefit the entire community to accelerate AGI research. **Github URL**: https://github.com/inclusionAI/AReaL --- # Content We release our training dataset in this repository. The RL training dataset consists of 40k high-quality mathematical reasoning tasks released by [DeepScaleR](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset). We are also actively developing better datasets suitable for training stronger and larger models in future releases. + `data/id2info.json`: The solutions to each question indexed by query ID. Used for computing rewards during training. + `prompts_for_r1_distilled.jsonl`: The dataset for training the [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model. + `prompts_for_zero.jsonl`: The dataset for training the R1-Zero-style model from [`Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B). To reproduce our results, please refer to the [step-by-step guideline](https://github.com/inclusionAI/AReaL/examples/) on GitHub. ```bash # Download the dataset DATA_PATH=/storage/datasets/ cd $DATA_PATH wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/prompts_for_r1_distilled.jsonl wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/id2info.json # Training in a Ray cluster with 16 nodes # stage 1 MODEL_PATH=${path_to_DeepSeek-R1-Distill-Qwen-1.5B} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 8192 # stage 2 MODEL_PATH=${model_path_from_stage_1} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 16384 # stage 3 MODEL_PATH=${model_path_from_stage_2} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 24000 ```

# AReaL:蚂蚁推理强化学习(Ant Reasoning RL) **面向大推理模型的全开源包容型强化学习项目** AReaL(蚂蚁推理强化学习,Ant Reasoning RL)是由蚂蚁研究院强化学习实验室开发的开源高效强化学习系统。AReaL 继承并适配了开源项目 [ReaLHF](https://github.com/openpsi-project/ReaLHF),用于训练大推理模型(Large Reasoning Models, LRMs),所有开发者均可复现并参与贡献。AReaL 是蚂蚁研究院为构建全面开放、包容的通用人工智能(AGI)世界而开发的工具与系统之一。 **AReaL 核心亮点** - 🛠️ **开放可复现**:我们将持续发布用于训练大推理模型的全部代码、数据集与训练流程——无任何隐藏细节或专有壁垒。 - 🚀 **可扩展性能**:AReaL 可无缝适配不同算力资源配置,从单节点到数百块GPU均可兼容。 - 🌍 **社区驱动的通用人工智能**:秉持全面开源的理念,我们希望本项目能够惠及整个社区,加速通用人工智能的研究进程。 **GitHub 仓库地址**:https://github.com/inclusionAI/AReaL --- # 数据集内容 本仓库发布了我们的训练数据集。本次发布的强化学习训练数据集包含4万个高质量数学推理任务,数据源自 [DeepScaleR](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) 公开数据集。我们也在积极开发更适配更强、更大模型训练的数据集,将在后续版本中发布。 + `data/id2info.json`: 以查询ID为索引的各问题解题方案,用于训练过程中的奖励计算。 + `prompts_for_r1_distilled.jsonl`: 用于训练 [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) 模型的数据集。 + `prompts_for_zero.jsonl`: 用于基于 [`Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B) 训练R1-Zero风格模型的数据集。 如需复现我们的实验结果,请参阅 GitHub 仓库中的 [分步指南](https://github.com/inclusionAI/AReaL/examples/)。 bash # 下载数据集 DATA_PATH=/storage/datasets/ cd $DATA_PATH wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/prompts_for_r1_distilled.jsonl wget https://www.modelscope.cn/datasets/inclusionAI/AReaL-RL-Data/resolve/master/data/id2info.json # 在16节点的Ray集群中开展训练 # 第一阶段 MODEL_PATH=${DeepSeek-R1-Distill-Qwen-1.5B模型路径} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 8192 # 第二阶段 MODEL_PATH=${第一阶段训练得到的模型路径} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 16384 # 第三阶段 MODEL_PATH=${第二阶段训练得到的模型路径} bash ./examples/train_1.5B_n16_on_ray.sh $MODEL_PATH $DATA_PATH 24000
提供机构:
maas
创建时间:
2025-02-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作