kevinpro/R-PRM

Name: kevinpro/R-PRM
Creator: kevinpro
Published: 2025-03-28 07:03:50
License: 暂无描述

Hugging Face2025-03-28 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/kevinpro/R-PRM

下载链接

链接失效反馈

官方服务：

资源简介：

R-PRM数据集是一个为训练推理驱动的过程奖励模型而开发的数据集，包括监督微调(SFT)和直接偏好优化(DPO)两个阶段。该数据集用于训练生成奖励模型，能够对数学推理过程进行逐步分析和判断，提高策略模型的评估质量和指导能力。

The R-PRM dataset is developed for training Reasoning-Driven Process Reward Models, including two stages: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). This dataset is used to train a generative reward model that performs step-by-step analysis and judgment of mathematical reasoning processes, improving the evaluation quality and guidance capabilities for policy models.

提供机构：

kevinpro

5,000+

优质数据集

54 个

任务类型

进入经典数据集