Harmlessness Reward Model
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/Ray2333/gpt2-large-harmless-reward_model
下载链接
链接失效反馈官方服务:
资源简介:
该数据集旨在评估生成文本的无害性,它设计了一个奖励模型来完成这一任务。此外,该模型是多目标强化学习算法训练中使用的奖励模型之一。具体任务是对强化学习进行奖励建模。
This dataset aims to evaluate the harmlessness of generated text, and a reward model is constructed for this purpose. In addition, this reward model is among those utilized during the training of multi-objective reinforcement learning algorithms. Its specific task is reward modeling for reinforcement learning.



