xinpeng/hh-rlhf-harmless-base
收藏Hugging Face2025-02-06 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/xinpeng/hh-rlhf-harmless-base
下载链接
链接失效反馈官方服务:
资源简介:
HH-RLHF-Harmless-Base数据集是基于Anthropic的HH-RLHF数据集处理而来的,专门用于通过TRL库进行偏好学习和对齐任务模型训练。该数据集包含了文本样本对,每个样本对都被标记为“选中”或“拒绝”,这些标记基于人类评估者对响应无害性的偏好。数据集的结构使模型能够学习生成无害响应的人类偏好,从而更有效地协助用户。
The HH-RLHF-Harmless-Base dataset is a processed version of Anthropics HH-RLHF dataset, specifically curated for training models using the TRL library for preference learning and alignment tasks. It includes pairs of text samples, each labeled as chosen or rejected based on human preferences for the harmlessness of responses. The structure of this dataset enables models to learn human preferences in generating harmless responses, enhancing their ability to assist users effectively.
提供机构:
xinpeng



