harshsingh-mathongo/SeePhys
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/harshsingh-mathongo/SeePhys
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- config_name: dev
data_files:
- split: total
path: dev/total-*
- split: dev
path: dev/dev-*
dataset_info:
- config_name: default
features:
- name: index
dtype: int64
- name: question
dtype: string
- name: answer
dtype: string
- name: images
list: image
- name: reasoning
dtype: string
- name: sig_figs
dtype: string
- name: level
dtype: int64
- name: subject
dtype: string
- name: language
dtype: string
- name: img_category
dtype: string
- name: vision_relevance
dtype: string
- name: caption
dtype: string
splits:
- name: train
num_bytes: 90258908.0
num_examples: 2000
download_size: 77879212
dataset_size: 90258908.0
- config_name: dev
features:
- name: question
dtype: string
- name: subject
dtype: string
- name: image_path
sequence: string
- name: sig_figs
dtype: string
- name: level
dtype: int64
- name: language
dtype: string
- name: index
dtype: int64
- name: img_category
dtype: string
- name: vision_relevance
dtype: string
- name: caption
dtype: string
- name: image_0
dtype: image
- name: image_1
dtype: image
- name: image_2
dtype: image
- name: image_3
dtype: image
splits:
- name: total
num_bytes: 96133884.0
num_examples: 2000
- name: dev
num_bytes: 9343791.0
num_examples: 200
download_size: 86916417
dataset_size: 105477675.0
task_categories:
- question-answering
- visual-question-answering
language:
- en
tags:
- physics
- multi-modal
size_categories:
- 1K<n<10K
---
# SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
Can AI truly see the Physics? Test your model with the newly released SeePhys Benchmark!
Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks integrating complex scientific diagrams with theoretical derivations. Experiments reveal that even SOTA models like Gemini-2.5-Pro and o4-mini achieve accuracy rates below 55%, with over 30% error rates on simple middle-school-level problems, highlighting significant challenges in multimodal reasoning.
The benchmark is now open for evaluation at the ICML 2025 AI for MATH Workshop. Academic and industrial teams are invited to test their models!
🔗 Key Links:
📜Paper: http://arxiv.org/abs/2505.19099
⚛️Project Page: https://seephys.github.io/
🏆Challenge Submission: https://www.codabench.org/competitions/7925/
➡️Competition Guidelines: https://sites.google.com/view/ai4mathworkshopicml2025/challenge
The answer will be announced on July 1st, 2025 (Anywhere on Earth, AoE), which is after the submission deadline for the ICML 2025 Challenges on Automated Math Reasoning and Extensions.
If you find SeePhys useful for your research and applications, please kindly cite using this BibTeX:
```
@article{xiang2025seephys,
title={SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning},
author={Kun Xiang, Heng Li, Terry Jingchen Zhang, Yinya Huang, Zirong Liu, Peixin Qu, Jixi He, Jiaqi Chen, Yu-Jie Yuan, Jianhua Han, Hang Xu, Hanhui Li, Mrinmaya Sachan, Xiaodan Liang},
journal={arXiv preprint arXiv:2505.19099},
year={2025}
}
```
提供机构:
harshsingh-mathongo



