five

NEWTONReasoning/NEWTON

收藏
Hugging Face2023-11-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/NEWTONReasoning/NEWTON
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: confident_questions path: "confident_questions.csv" - split: explicit_questions path: "explicit_questions.csv" - split: implicit_questions path: "implicit_questions.csv" --- 🌟 **NEWTON: Evaluating Large Language Models for Physics Reasoning** 🌟 Are you curious about the physical reasoning abilities of Large Language Models (LLMs) like GPT-4 in different contexualized settings? Look no further! NEWTON is here to help. 🚀 **What is NEWTON?** 🚀 NEWTON is a repository and benchmark designed to assess the physics reasoning skills of LLMs. While these models excel in many language tasks, their grasp of physical concepts often remains unexplored. 🔬 **What's Inside NEWTON?** 🔬 * **Repository**: We provide a collection of 2800 object-attribute pairs, serving as a foundation for generating customizable assessment templates tailored to your specific needs. * **Benchmark**: We've curated 160k QA questions to evaluate LLMs across foundational, explicit, and implicit physics reasoning tasks. Discover how these models perform in scenarios involving everyday objects and attributes. * **Pipeline**: A pipeline to synthesize evaluation sets tailored to particular applications. 🤖 **Real-World Applications** 🤖 NEWTON's potential extends beyond evaluation. It can pave the way for integrating LLMs into physically grounded settings, such as robotic manipulation. ❓ If you have any questions, please contact [me](https://helen9975.github.io/) at `yiruwang [at] cs [dot] washington [dot] edu`. ❓
提供机构:
NEWTONReasoning
原始信息汇总

NEWTON: 评估大型语言模型在物理推理方面的能力

简介

NEWTON是一个用于评估大型语言模型(LLMs)在物理推理能力方面的仓库和基准。尽管这些模型在许多语言任务中表现出色,但它们对物理概念的理解往往未被探索。

内容

  • 仓库: 提供2800个对象-属性对,用于生成定制的评估模板。
  • 基准: 包含16万个问答问题,用于评估LLMs在基础、显式和隐式物理推理任务中的表现。
  • 管道: 用于合成特定应用的评估集的管道。

应用

NEWTON不仅限于评估,还可用于将LLMs集成到物理环境中,如机器人操作。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作