SciVerse
收藏arXiv2025-09-30 收录
下载链接:
https://sciverse-cuhk.github.io
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个多模态的科学评估基准,旨在对大型多模态模型(LMMs)进行评估,涵盖了5,735个测试实例,分为五个不同的版本。这些版本分别考察了模型在科学知识理解、多模态内容解读以及链式思维(Chain-of-Thought,简称CoT)推理方面的能力。此外,该数据集包含了不同类型的问题版本,分别为无知识型、轻知识型、丰富知识型、视觉丰富型和仅视觉型。规模上,该数据集包含了5,735个测试实例,其任务是评估LMMs在科学问题解决方面的表现。
This dataset is a multimodal scientific evaluation benchmark designed to evaluate large multimodal models (LMMs). It encompasses 5,735 test instances across five distinct versions, each of which assesses the model's capabilities in scientific knowledge comprehension, multimodal content interpretation, and chain-of-thought (CoT) reasoning. Additionally, the dataset features various question variants, including knowledge-free, knowledge-light, knowledge-rich, visual-rich, and vision-only types. Overall, this benchmark aims to measure the performance of LMMs in scientific problem-solving.



