gp02-mcgill/ultrafeedback_binarised_min_max
收藏Hugging Face2025-01-31 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/gp02-mcgill/ultrafeedback_binarised_min_max
下载链接
链接失效反馈官方服务:
资源简介:
ultrafeedback_binarised_max_min是一个为训练需要二元偏好标签的模型设计的成对偏好数据集,源自UltraFeedback数据集,该数据集提供了用于提高语言模型质量的高质量反馈。数据集经过预处理,包括计算平均评分、成对标记和二值化,适合用于学习偏好相关的任务,如强化学习从人类反馈(RLHF)和基于偏好的排名。
ultrafeedback_binarised_max_min is a pairwise preference dataset designed for training models that require binary preference labels, derived from the UltraFeedback dataset which provides high-quality feedback for improving language models. The dataset has been preprocessed with average rating computation, pairwise labeling, and binarization, making it suitable for tasks involving learning from preferences such as reinforcement learning from human feedback (RLHF) and preference-based ranking.
提供机构:
gp02-mcgill



