heli-stand/hh-rlhf
收藏Hugging Face2025-02-12 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/heli-stand/hh-rlhf
下载链接
链接失效反馈官方服务:
资源简介:
HH-RLHF数据集是一个经过处理的 Anthropic HH-RLHF 数据集版本,专门用于通过TRL库进行偏好学习和对齐任务训练模型。该数据集包含文本样本对,每个样本都被标记为“选中”或“拒绝”,基于人类对响应的帮助性和无害性的偏好。这个数据集使模型能够学习在生成帮助性和无害性响应时的人类偏好,提高它们有效协助用户的能力。
The HH-RLHF dataset is a processed version of Anthropics HH-RLHF dataset, specifically curated for training models using the TRL library for preference learning and alignment tasks. It contains pairs of text samples, each labeled as chosen or rejected based on human preferences regarding the helpfulness and harmlessness of the responses. This dataset enables models to learn human preferences in generating helpful and harmless responses, enhancing their ability to assist users effectively.
提供机构:
heli-stand



