mncai/orpo-vlm-pairs

Name: mncai/orpo-vlm-pairs
Creator: mncai
Published: 2026-02-06 02:26:04
License: 暂无描述

Hugging Face2026-02-06 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/mncai/orpo-vlm-pairs

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含视觉语言偏好对，用于使用ORPO（Odds Ratio Preference Optimization）、DPO或类似的基于偏好的对齐方法训练VLM模型。数据集包含67,754个经过过滤/精炼的偏好对和11,982张图片（存储在mncai/orpo-vlm-pairs-full中）。数据格式为JSONL（不包含图片），语言为英语，任务为视觉语言偏好学习。每个数据行包含prompt（带有图像引用的聊天消息）、chosen（首选响应）、rejected（非首选响应）和meta（包括源数据集、使用的模型、判断信息等元数据）字段。图片文件需从其他链接下载。数据集来源于DocMatix等源数据集，并继承其许可协议。

This dataset contains vision-language preference pairs for training VLM models using ORPO (Odds Ratio Preference Optimization), DPO, or similar preference-based alignment methods. The dataset includes 67,754 filtered/refined preference pairs and 11,982 images (hosted in mncai/orpo-vlm-pairs-full). The format is JSONL (images not included), the language is English, and the task is vision-language preference learning. Each row contains prompt (chat messages with image references), chosen (preferred response), rejected (non-preferred response), and meta (metadata including source dataset, models used, judge info) fields. Image files need to be downloaded from another link. The dataset is derived from source datasets like DocMatix and inherits their licenses.

提供机构：

mncai

5,000+

优质数据集

54 个

任务类型

进入经典数据集