NEWTONReasoning/NEWTON
收藏Hugging Face2023-11-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/NEWTONReasoning/NEWTON
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: confident_questions
path: "confident_questions.csv"
- split: explicit_questions
path: "explicit_questions.csv"
- split: implicit_questions
path: "implicit_questions.csv"
---
🌟 **NEWTON: Evaluating Large Language Models for Physics Reasoning** 🌟
Are you curious about the physical reasoning abilities of Large Language Models (LLMs) like GPT-4 in different contexualized settings? Look no further! NEWTON is here to help.
🚀 **What is NEWTON?** 🚀
NEWTON is a repository and benchmark designed to assess the physics reasoning skills of LLMs. While these models excel in many language tasks, their grasp of physical concepts often remains unexplored.
🔬 **What's Inside NEWTON?** 🔬
* **Repository**: We provide a collection of 2800 object-attribute pairs, serving as a foundation for generating customizable assessment templates tailored to your specific needs.
* **Benchmark**: We've curated 160k QA questions to evaluate LLMs across foundational, explicit, and implicit physics reasoning tasks. Discover how these models perform in scenarios involving everyday objects and attributes.
* **Pipeline**: A pipeline to synthesize evaluation sets tailored to particular applications.
🤖 **Real-World Applications** 🤖
NEWTON's potential extends beyond evaluation. It can pave the way for integrating LLMs into physically grounded settings, such as robotic manipulation.
❓ If you have any questions, please contact [me](https://helen9975.github.io/) at `yiruwang [at] cs [dot] washington [dot] edu`. ❓
提供机构:
NEWTONReasoning
原始信息汇总
NEWTON: 评估大型语言模型在物理推理方面的能力
简介
NEWTON是一个用于评估大型语言模型(LLMs)在物理推理能力方面的仓库和基准。尽管这些模型在许多语言任务中表现出色,但它们对物理概念的理解往往未被探索。
内容
- 仓库: 提供2800个对象-属性对,用于生成定制的评估模板。
- 基准: 包含16万个问答问题,用于评估LLMs在基础、显式和隐式物理推理任务中的表现。
- 管道: 用于合成特定应用的评估集的管道。
应用
NEWTON不仅限于评估,还可用于将LLMs集成到物理环境中,如机器人操作。



