ScratchEval

arXiv2025-09-30 收录

下载链接：

https://github.com/HKBUNLP/ScratchEval

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为ScratchEval，是一个创新性的基准测试，旨在评估大型多模态模型（LMMs）在视觉编程推理方面的能力。它融合了视觉元素和嵌入的编程逻辑，要求模型能够同时处理视觉信息和代码结构。ScratchEval包含了中英双语数据，并在数学、逻辑思维、图形感知和空间感知任务上对LMMs进行评估。其核心任务是评估视觉编程推理能力。

This dataset, named ScratchEval, is an innovative benchmark designed to evaluate the visual programming reasoning capabilities of large multimodal models (LMMs). It integrates visual elements and embedded programming logic, requiring models to simultaneously process both visual information and code structures. ScratchEval contains Chinese-English bilingual data, and evaluates LMMs on tasks involving mathematics, logical thinking, graphic perception and spatial perception. Its core task is to assess visual programming reasoning capabilities.

5,000+

优质数据集

54 个

任务类型

进入经典数据集