five

UGPhysics

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/YangLabHKUST/UGPhysics
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为UGPhysics,是一个大规模且全面的基准测试,专门用于评估大型语言模型在本科物理推理方面的能力。它包含了5,520个本科物理问题,这些问题以中英双语呈现,覆盖了13个学科领域,包含七种不同的答案类型和四种不同的物理推理技能。此外,该数据集在数据泄露方面经过了严格筛选,并配备了模型辅助的基于规则的判断(MARJ)管道,以评估答案的正确性。该数据集的规模为5,520个问题,任务旨在评估大型语言模型在物理推理能力方面的表现。

The dataset is named UGPhysics, a large-scale and comprehensive benchmark specifically designed to evaluate the capabilities of large language models (LLMs) in undergraduate-level physics reasoning. It includes 5,520 undergraduate physics questions presented in both Chinese and English, covering 13 disciplinary fields, seven different answer types and four distinct physics reasoning skills. Additionally, this dataset has undergone strict screening for data leakage issues, and is equipped with a model-assisted rule-based judgment (MARJ) pipeline to evaluate the correctness of answers. Comprising 5,520 questions in total, this dataset's core task is to assess the performance of large language models in terms of their physics reasoning abilities.
提供机构:
YangLabHKUST
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作