introvoyz041/HiPhO
收藏Hugging Face2025-12-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/HiPhO
下载链接
链接失效反馈官方服务:
资源简介:
HiPhO(高中物理奥林匹克竞赛基准)是第一个专门设计用于评估(多模态)大语言模型在2024-2025年真实高中物理奥林匹克竞赛题目上物理推理能力的基准测试。该数据集包含13个物理奥林匹克竞赛的360个问题,涵盖5个物理领域(力学、电磁学、热力学、光学和现代物理学)和4种模态类型(纯文本、文本+插图、文本+变量图和文本+数据图)。评估采用答案级和步骤级评分,并与官方评分方案一致,同时将模型得分映射到奖牌级别(金/银/铜)并与人类表现进行比较。
HiPhO (High School Physics Olympiad Benchmark) is the first benchmark specifically designed to evaluate the physical reasoning abilities of (M)LLMs on real-world Physics Olympiads from 2024–2025. The dataset contains 360 problems from 13 Physics Olympiads, categorized across 5 Physics Fields (Mechanics, Electromagnetism, Thermodynamics, Optics, Modern Physics) and 4 Modality Types (Text-Only, Text+Illustration Figure, Text+Variable Figure, Text+Data Figure). Evaluation is conducted using answer-level and step-level scoring, aligned with official marking schemes, and maps model scores to medal levels (Gold/Silver/Bronze) for comparison with human performance.
提供机构:
introvoyz041



