NEWTON
收藏arXiv2025-09-30 收录
下载链接:
https://newtonreasoning.github.io
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为NEWTON,它是一个用于评估大型语言模型(LLM)物理推理能力的资料库和基准测试工具。它包含了2800个对象-属性对以及16万个问答问题。此外,NEWTON为研究人员提供了特定领域的适应性调整,并以结构化的方式评估语言模型在理解、应用和分析物理属性方面的能力。该数据集的规模包括2800个对象-属性对和16万个问答问题,其任务旨在评估LLM的物理推理能力。
This dataset, named NEWTON, is a repository and benchmark tool for evaluating the physical reasoning capabilities of Large Language Models (LLMs). It contains 2,800 object-attribute pairs and 160,000 question-answer pairs. Additionally, NEWTON offers domain-specific adaptation support for researchers, and evaluates language models' abilities to understand, apply and analyze physical properties in a structured manner. With its scale of 2,800 object-attribute pairs and 160,000 question-answer pairs, the tasks of this dataset are designed to assess the physical reasoning capabilities of LLMs.



