amishor/reinforce-learning
收藏Hugging Face2025-10-19 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/amishor/reinforce-learning
下载链接
链接失效反馈官方服务:
资源简介:
DAPO-RL-Instruct数据集是一个源自《DAPO: An Open-Source LLM Reinforcement Learning System at Scale》技术报告的高质量指令遵循数据集。该数据集将论文中的关键技术概念和训练策略转化为指令-响应对,适合用于大语言模型在强化学习环境下的微调和评估。数据集包含约1,200条样本,以.jsonl格式提供,可用于研究和教育目的。
The DAPO-RL-Instruct Dataset is a high-quality instruction-following dataset derived from the technical report DAPO: An Open-Source LLM Reinforcement Learning System at Scale. It contains approximately 1,200 instruction-response pairs extracted from the paper, suitable for fine-tuning and evaluating large language models in reinforcement learning contexts. The dataset is provided in a .jsonl format and is intended for research and educational purposes.
提供机构:
amishor



