amishor/reinforce-learning

Name: amishor/reinforce-learning
Creator: amishor
Published: 2025-10-19 04:09:51
License: 暂无描述

Hugging Face2025-10-19 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/amishor/reinforce-learning

下载链接

链接失效反馈

官方服务：

资源简介：

DAPO-RL-Instruct数据集是一个源自《DAPO: An Open-Source LLM Reinforcement Learning System at Scale》技术报告的高质量指令遵循数据集。该数据集将论文中的关键技术概念和训练策略转化为指令-响应对，适合用于大语言模型在强化学习环境下的微调和评估。数据集包含约1,200条样本，以.jsonl格式提供，可用于研究和教育目的。

The DAPO-RL-Instruct Dataset is a high-quality instruction-following dataset derived from the technical report DAPO: An Open-Source LLM Reinforcement Learning System at Scale. It contains approximately 1,200 instruction-response pairs extracted from the paper, suitable for fine-tuning and evaluating large language models in reinforcement learning contexts. The dataset is provided in a .jsonl format and is intended for research and educational purposes.

提供机构：

amishor

5,000+

优质数据集

54 个

任务类型

进入经典数据集