MLLM-AS-A-JUDGE

Name: MLLM-AS-A-JUDGE
Creator: 理海大学
Published: 2024-02-07 20:28:32
License: 暂无描述

arXiv2024-02-07 更新2024-06-21 收录

下载链接：

https://github.com/Dongping-Chen/MLLM-as-a-Judge

下载链接

链接失效反馈

官方服务：

资源简介：

MLLM-AS-A-JUDGE是由理海大学的研究团队创建的一个综合数据集，旨在评估多模态大型语言模型（MLLMs）在视觉语言领域的评判能力。该数据集包含3300个图像-指令对，涵盖了图像标注、数学推理、文本阅读和信息图理解等多种任务。数据集的创建过程涉及精心挑选的10个不同任务的数据集，并通过四个主流MLLMs生成响应，然后由人工评估者进行严格标注。MLLM-AS-A-JUDGE的应用领域主要集中在解决如何使MLLMs更接近人类偏好，特别是在评分评估、配对比较和批量排序任务中。

MLLM-AS-A-JUDGE is a comprehensive dataset developed by a research team from Lehigh University, designed to evaluate the judging capabilities of multimodal large language models (MLLMs) in the visual-language domain. This dataset includes 3,300 image-instruction pairs, covering diverse tasks such as image captioning, mathematical reasoning, text reading, and infographic understanding. The construction of the dataset involves carefully selecting 10 distinct task-specific datasets, generating responses via four mainstream MLLMs, and then performing rigorous manual annotation by human evaluators. The core application scenarios of MLLM-AS-A-JUDGE focus on addressing how to align MLLMs more closely with human preferences, particularly in scoring evaluation, pairwise comparison, and batch ranking tasks.

提供机构：

理海大学

创建时间：

2024-02-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集