MMEvalPro 多模态基准评估数据集

超神经2024-08-15 更新2024-12-14 收录

下载链接：

https://hyper.ai/cn/datasets/33402

下载链接

链接失效反馈

官方服务：

资源简介：

MMEvalPro 是由北京大学、中国医学科学院、香港中文大学和阿里巴巴的研究团队于 2024 年提出的多模态大模型 (LMMs) 评估基准，旨在提供更可信和高效的评估方法，解决现有多模态评估基准中存在的问题。现有基准在评估 LMMs 时存在系统性偏差，即使是没有视觉感知能力的大型语言模型 (LLMs) 也能在这些基准上取得非平凡的性能，这削弱了这些评估的可信度。 MMEvalPro 通过增加两个「锚」问题（一个感知问题和一个知识问题）来改进现有的评估方法，形成测试模型多模态理解不同方面的「问题三元组」。

MMEvalPro is a multimodal large model (LMMs) evaluation benchmark proposed in 2024 by research teams from Peking University, Chinese Academy of Medical Sciences, The Chinese University of Hong Kong, and Alibaba. It aims to provide more reliable and efficient evaluation methods to address the existing issues in current multimodal evaluation benchmarks. Current benchmarks suffer from systematic biases when evaluating LMMs, where even large language models (LLMs) without visual perception capabilities can achieve non-trivial performance on these benchmarks, which undermines the credibility of such evaluations. MMEvalPro improves existing evaluation methods by adding two "anchor" questions (one perceptual question and one knowledge question), forming "question triplets" that test different aspects of the model's multimodal understanding.

创建时间：

2024-08-13

搜集汇总

数据集介绍

背景与挑战

背景概述

MMEvalPro是由多个研究机构在2024年提出的多模态大模型评估基准，旨在解决现有基准中存在的系统性偏差问题。它采用问题三元组和真实准确性指标来评估模型的多模态理解能力，包含超过6,000个问题，覆盖不同主题和难度级别。

以上内容由遇见数据集搜集并总结生成