HANI-LAB/Med-REFL-DPO
收藏Hugging Face2025-06-19 更新2025-11-29 收录
下载链接:
https://hf-mirror.com/datasets/HANI-LAB/Med-REFL-DPO
下载链接
链接失效反馈官方服务:
资源简介:
Med-REFL数据集是一个用于提高大型语言模型在医疗领域推理和反思能力的直接偏好优化(DPO)数据集。该数据集通过树状思维(ToT)方法构建,分为两个子集:推理增强数据和反思增强数据。推理增强数据包含约12,000个用于提高一般推理辨识度的偏好对,而反思增强数据包含约21,000个偏好对,专门用于模型的自我校正。
The Med-REFL dataset is a Direct Preference Optimization (DPO) dataset designed to improve the reasoning and reflection capabilities of Large Language Models in the medical field. The dataset is constructed using a Tree-of-Thought (ToT) approach and is divided into two subsets: Reasoning Enhancement Data with approximately 12,000 preference pairs for general reasoning improvement, and Reflection Enhancement Data with approximately 21,000 pairs focused on self-correction.
提供机构:
HANI-LAB



