five

GRASP

收藏
arXiv2024-01-31 更新2024-06-21 收录
下载链接:
https://drive.google.com/drive/folders/1F_9R1zLtAMQ7N_ IIIio6HjEBkGuuMX4M
下载链接
链接失效反馈
官方服务:
资源简介:
GRASP数据集是一个专为评估多模态大型语言模型在语言基础和物理理解能力上的新颖基准。该数据集通过两级方法利用Unity模拟实现评估,第一级测试模型将简单文本描述与视觉信息关联的能力,第二级评估模型对直观物理原则如物体永久性和连续性的理解。GRASP不仅发布了一个全面的评估平台,还通过该平台评估了多个最先进的多模态LLMs,揭示了这些模型在语言基础和直观物理理解上的显著不足。数据集的应用领域旨在解决模型在理解和处理视觉场景中的物理事件的能力,特别是在抽象和模拟环境中的表现。

The GRASP dataset is a novel benchmark specifically designed to evaluate the capabilities of multimodal large language models (LLMs) in language grounding and physical comprehension. This dataset employs Unity simulations for evaluation through a two-level framework: the first level tests the model's ability to associate simple textual descriptions with visual information, while the second level assesses the model's understanding of intuitive physical principles such as object permanence and continuity. In addition to releasing a comprehensive evaluation platform, GRASP has evaluated multiple state-of-the-art multimodal LLMs via this platform, revealing significant shortcomings in these models' language grounding and intuitive physical comprehension abilities. The targeted application scope of this dataset aims to address models' capabilities in understanding and processing physical events within visual scenes, particularly their performance in abstract and simulated environments.
提供机构:
奥斯纳布吕克大学
创建时间:
2023-11-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作