NEWTONReasoning/NEWTON

Name: NEWTONReasoning/NEWTON
Creator: NEWTONReasoning
Published: 2023-11-14 06:26:13
License: 暂无描述

Hugging Face2023-11-14 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/NEWTONReasoning/NEWTON

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: confident_questions path: "confident_questions.csv" - split: explicit_questions path: "explicit_questions.csv" - split: implicit_questions path: "implicit_questions.csv" --- 🌟 **NEWTON: Evaluating Large Language Models for Physics Reasoning** 🌟 Are you curious about the physical reasoning abilities of Large Language Models (LLMs) like GPT-4 in different contexualized settings? Look no further! NEWTON is here to help. 🚀 **What is NEWTON?** 🚀 NEWTON is a repository and benchmark designed to assess the physics reasoning skills of LLMs. While these models excel in many language tasks, their grasp of physical concepts often remains unexplored. 🔬 **What's Inside NEWTON?** 🔬 * **Repository**: We provide a collection of 2800 object-attribute pairs, serving as a foundation for generating customizable assessment templates tailored to your specific needs. * **Benchmark**: We've curated 160k QA questions to evaluate LLMs across foundational, explicit, and implicit physics reasoning tasks. Discover how these models perform in scenarios involving everyday objects and attributes. * **Pipeline**: A pipeline to synthesize evaluation sets tailored to particular applications. 🤖 **Real-World Applications** 🤖 NEWTON's potential extends beyond evaluation. It can pave the way for integrating LLMs into physically grounded settings, such as robotic manipulation. ❓ If you have any questions, please contact [me](https://helen9975.github.io/) at `yiruwang [at] cs [dot] washington [dot] edu`. ❓

提供机构：

NEWTONReasoning

原始信息汇总

NEWTON: 评估大型语言模型在物理推理方面的能力

简介

NEWTON是一个用于评估大型语言模型（LLMs）在物理推理能力方面的仓库和基准。尽管这些模型在许多语言任务中表现出色，但它们对物理概念的理解往往未被探索。

内容

仓库: 提供2800个对象-属性对，用于生成定制的评估模板。
基准: 包含16万个问答问题，用于评估LLMs在基础、显式和隐式物理推理任务中的表现。
管道: 用于合成特定应用的评估集的管道。

应用

NEWTON不仅限于评估，还可用于将LLMs集成到物理环境中，如机器人操作。

5,000+

优质数据集

54 个

任务类型

进入经典数据集