five

MLLM-AS-A-JUDGE

收藏
arXiv2024-02-07 更新2024-06-21 收录
下载链接:
https://github.com/Dongping-Chen/MLLM-as-a-Judge
下载链接
链接失效反馈
官方服务:
资源简介:
MLLM-AS-A-JUDGE是由理海大学的研究团队创建的一个综合数据集,旨在评估多模态大型语言模型(MLLMs)在视觉语言领域的评判能力。该数据集包含3300个图像-指令对,涵盖了图像标注、数学推理、文本阅读和信息图理解等多种任务。数据集的创建过程涉及精心挑选的10个不同任务的数据集,并通过四个主流MLLMs生成响应,然后由人工评估者进行严格标注。MLLM-AS-A-JUDGE的应用领域主要集中在解决如何使MLLMs更接近人类偏好,特别是在评分评估、配对比较和批量排序任务中。

MLLM-AS-A-JUDGE is a comprehensive dataset developed by a research team from Lehigh University, designed to evaluate the judging capabilities of multimodal large language models (MLLMs) in the visual-language domain. This dataset includes 3,300 image-instruction pairs, covering diverse tasks such as image captioning, mathematical reasoning, text reading, and infographic understanding. The construction of the dataset involves carefully selecting 10 distinct task-specific datasets, generating responses via four mainstream MLLMs, and then performing rigorous manual annotation by human evaluators. The core application scenarios of MLLM-AS-A-JUDGE focus on addressing how to align MLLMs more closely with human preferences, particularly in scoring evaluation, pairwise comparison, and batch ranking tasks.
提供机构:
理海大学
创建时间:
2024-02-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作