NPHardEval4V
收藏arXiv2024-03-06 更新2024-06-21 收录
下载链接:
https://github.com/lizhouf/NPHardEval4V
下载链接
链接失效反馈官方服务:
资源简介:
NPHardEval4V是由密歇根大学信息学院、山东大学控制科学与工程学院和微软亚洲研究院共同开发的一个动态推理基准数据集,旨在评估多模态大型语言模型(MLLMs)的纯推理能力。该数据集通过将NPHardEval中的文本问题描述转换为图像表示来构建,涵盖了多项复杂度不同的任务,以月度更新机制保持数据集的新鲜度和挑战性。NPHardEval4V的应用领域主要集中在提升MLLMs在复杂问题解决和任务完成中的推理能力,特别是在处理NP完全和NP困难问题时的表现。
NPHardEval4V is a dynamic reasoning benchmark dataset jointly developed by the School of Information of the University of Michigan, the School of Control Science and Engineering of Shandong University, and Microsoft Research Asia. It aims to evaluate the pure reasoning capabilities of multimodal large language models (MLLMs). This dataset is constructed by transforming the textual problem descriptions from the original NPHardEval dataset into image representations. It covers multiple tasks with varying levels of complexity, and employs a monthly update mechanism to sustain the dataset's freshness and challenging nature. The primary application scope of NPHardEval4V lies in improving the reasoning performance of MLLMs in complex problem-solving and task completion, particularly when handling NP-complete and NP-hard problems.
提供机构:
密歇根大学信息学院, 山东大学控制科学与工程学院, 微软亚洲研究院
创建时间:
2024-03-04



