PolyMATH
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/him1411/polymath
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为PolyMATH,包含5000张经手动收集的高质量图像,覆盖了10个不同类别,如模式识别、空间推理和相对推理等,旨在评估多模态大型语言模型(MLLMs)的通用认知推理能力。作者独立解决了这个数据集中的问题,以此建立了人类表现基准,并用于评估针对MLLMs的各种提示策略。该数据集规模涉及10个类别中的5000张高质量图像,其任务是评估多模态环境下的认知推理能力。
The dataset, named PolyMATH, contains 5,000 manually collected high-quality images spanning 10 distinct categories, such as pattern recognition, spatial reasoning, relational reasoning, and more, and is designed to evaluate the general cognitive reasoning capabilities of multimodal large language models (MLLMs). The authors independently solved the problems in this dataset to establish a human performance baseline, which is utilized to assess various prompting strategies for MLLMs. This dataset consists of 5,000 high-quality images across 10 categories, with its task being to evaluate cognitive reasoning abilities in multimodal scenarios.
提供机构:
Authors of the paper



