MMT-Bench
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/opengvlab/mmt-bench
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为MMT-Bench,旨在评估多模态大型语言模型(MLLMs)的通用能力。该基准测试强调了提示敏感性和定制化对于准确评估模型性能的重要性。在规模方面,其子集MMT-S包含了跨越19个类别的83项任务,旨在对多模态模型在各类任务中的能力进行评估。
This dataset, named MMT-Bench, is designed to evaluate the general capabilities of Multimodal Large Language Models (MLLMs). This benchmark highlights the significance of prompt sensitivity and customization for the accurate assessment of model performance. In terms of scale, its subset MMT-S contains 83 tasks spanning 19 categories, aiming to assess the capabilities of multimodal models across various tasks.



