helehan/topic-overwrite
收藏Hugging Face2024-12-08 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/helehan/topic-overwrite
下载链接
链接失效反馈官方服务:
资源简介:
该数据集由llava-1.5-7b生成并由llava-1.6-34b标注,包含21k对选择和拒绝的答案。它用于RLHF/RLAIF中的DPO训练。数据集的创建遵循了TPO论文中概述的过程,遵循主题级别偏好覆盖方法。其目的是增强MLLM/LVLM的可信度并减少幻觉。
This dataset, generated by llava-1.5-7b and labeled by llava-1.6-34b, contains 21k pairs of chosen and rejected answers. It is used for DPO training in RLHF/RLAIF. The dataset was created using the processes outlined in the TPO paper, adhering to the Topic-level Preference Overwriting methodology. It aims to enhance the trustworthiness of MLLM/LVLM and reduce hallucinations.
提供机构:
helehan



