MLLM Art Appreciation Evaluation Results and Correct Response Terms Appendix

Name: MLLM Art Appreciation Evaluation Results and Correct Response Terms Appendix
Creator: Monash University
Published: 2024-03-14 11:20:38
License: 暂无描述

DataCite Commons2024-03-14 更新2025-04-16 收录

下载链接：

https://bridges.monash.edu/articles/dataset/MLLM_Art_Appreciation_Evaluation_Results_and_Correct_Response_Terms_Appendix/25406851/1

下载链接

链接失效反馈

官方服务：

资源简介：

Multi-modal large language models (MLLMs) are primarily evaluated on objective measures such as reasoning, common sense and pattern recognition. However, there is a notable lack of testing involving open-ended responses which require human evaluation. In response to this, this paper presents a comparative analysis of the capacities of GPT-4V, Gemini Pro, Gemini Ultra and MPLUG Owl2 in visual art appreciation, a domain requiring complex competencies demonstrative of higher order cognitive fluency thus presenting a ripe area for the evaluation of human-like intelligences.A framework for the machine appreciation art was developed based on an established model of human aesthetic experience as a foundation. Seven questions were designed to assess each stage of this framework which outlines the nuanced capacities by which MLLMs can appreciate a visual art image. MLLMs were assessed on their long-form responses to this question set for ten distinct art images representing varying styles and mediums. <br><br>

多模态大语言模型（MLLMs）主要通过推理、常识及模式识别等客观指标进行评估。然而，在需要人工评估的开放式响应测试方面存在显著不足。针对这一现状，本文对GPT-4V、Gemini Pro、Gemini Ultra及MPLUG Owl2在视觉艺术鉴赏领域的能力展开对比分析——该领域需展现复杂能力以体现高阶认知流畅性，因此成为评估类人智能的理想场景。本文以成熟的人类审美体验模型为基础，构建了机器艺术鉴赏框架。围绕该框架的各阶段设计了7个问题，旨在评估多模态大语言模型鉴赏视觉艺术图像的细致能力。针对10幅代表不同风格与媒介的艺术图像，研究团队基于该问题集对上述模型的长文本响应能力进行了评估。<br><br>

提供机构：

Monash University

创建时间：

2024-03-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集