PhysiCo

arXiv2025-09-30 收录

下载链接：

https://physico-benchmark.github.io/

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为PhysiCo，旨在通过低级和高级理解子任务，评估大型语言模型对52个物理概念的理解能力。该数据集采用自然语言和网格表示两种方式来挑战大型语言模型的理解能力，以此避免记忆性问题。数据集规模包括52个概念对应的1,200对网格示例。任务的目的是对物理概念的理解进行总结性评估。

The dataset named PhysiCo is designed to evaluate the comprehension abilities of large language models (LLMs) regarding 52 physical concepts through low-level and high-level understanding subtasks. It employs two modalities, natural language and grid representation, to test LLMs' understanding so as to prevent memorization-related issues. The dataset consists of 1,200 grid-based example pairs corresponding to the 52 concepts. The core purpose of this assessment task is to conduct a comprehensive summative evaluation of models' grasp of physical concepts.

5,000+

优质数据集

54 个

任务类型

进入经典数据集