GeoSense
收藏魔搭社区2026-01-06 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/OpenStellarTeam/GeoSense
下载链接
链接失效反馈官方服务:
资源简介:
# Overview
<p align="center">
🌐 <a href="https://gfzshiwai.github.io/GeoSense_Project/" target="_blank">Website</a> • 🤗 <a href="https://huggingface.co/datasets/OpenStellarTeam/GeoSense" target="_blank">Hugging Face</a> • ⏬ <a href="#data" target="_blank">Data</a> • 📃 <a href="https://arxiv.org/abs/2504.12597" target="_blank">Paper</a><br>
</p>
## Dataset
**GeoSense** is the first comprehensive bilingual benchmark designed to systematically evaluate the geometric reasoning abilities of MLLMs through the lens of **geometric principles**. GeoSense features a **five-level hierarchical** framework of geometric principles spanning plane and solid geometry, an **intricately annotated dataset** of 1,789 problems, and an **innovative evaluation strategy**.
Please visit our [website](https://gfzshiwai.github.io/GeoSense_Project/) or check our [paper](https://arxiv.org/abs/2504.12597) for more details.
> This is the evaluation repository for GeoSense, and it follows the MIT License.
## 💫 Introduction
* To comprehensively assess the reasoning abilities of MLLMs, we present **GeoSense**, which consists of a dataset containing 1,789 high-quality questions across 148 geometric principles (definitions, theorems, and formulas), spanning from plane geometry to solid geometry. Specifically, the key features of our proposed GeoSense are as follows:
* **5-level hierarchical framework of geometric principles:** GeoSense has established a five-layer knowledge hierarchy encompassing 148 geometric principles, covering 65 definitions, 47 theorems, and 36 computation formulas in both plane and solid geometry, providing a multidimensional and fine-grained evaluation of the model's ability to identify and apply knowledge when faced with geometric problems.
* 🍀**Intricately annotated dataset:** GeoSense collects 1,789 geometric problems and provides detailed bilingual annotations for 5,556 geometric principles necessary for solving these problems, including their correspondence and application to elements in geometric diagrams. Special tags (\<note>) are used to mark key points in problem-solving to ensure comprehensive and accurate model evaluation. GeoSense follows a rigorous construction process, with 23 graduate students specializing in geometry conducting data annotation, review, and quality control.
* ⚡**An innovative evaluation strategy:** GeoSense employs innovative evaluation methods, introducing two novel metrics: GPI (Geometric Principles Identification) and GPA (Geometric Principles Application). These metrics focus on assessing the model’s ability to identify and apply geometric principles in complex visual scenarios, helping to identify potential shortcomings and areas for improvement in the model’s reasoning process.
- Based on GeoSense, we have conducted a comprehensive evaluation of the reasoning capabilities of MLLMs. We also maintain a comprehensive leaderboard list.
## 📊 Leaderboard
Please visit our [website](https://gfzshiwai.github.io/GeoSense_Project/#leaderboard)
## ⚖️ Evals
Please visit our [github](https://github.com/OpenStellarTeam/GeoSense/tree/main)
## Citation
Please cite our paper if you use our dataset.
```
@misc{xu2025geosenseevaluatingidentificationapplication,
title={GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning},
author={Liangyu Xu and Yingxiu Zhao and Jingyun Wang and Yingyao Wang and Bu Pi and Chen Wang and Mingliang Zhang and Jihao Gu and Xiang Li and Xiaoyong Zhu and Jun Song and Bo Zheng},
year={2025},
eprint={2504.12597},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.12597},
}
```
# 概述
<p align="center">
🌐 <a href="https://gfzshiwai.github.io/GeoSense_Project/" target="_blank">官网</a> • 🤗 <a href="https://huggingface.co/datasets/OpenStellarTeam/GeoSense" target="_blank">Hugging Face</a> • ⏬ <a href="#data" target="_blank">数据</a> • 📃 <a href="https://arxiv.org/abs/2504.12597" target="_blank">论文</a><br>
</p>
## 数据集
**GeoSense** 是首个以**几何原理**为视角,系统性评估多模态大语言模型(Multimodal Large Language Model)几何推理能力的综合性双语基准测试集。GeoSense具备涵盖平面几何与立体几何的**五级层级化**几何原理框架、1789道精细化标注的高质量数据集,以及**创新性评估策略**。
请访问我们的[官网](https://gfzshiwai.github.io/GeoSense_Project/)或查阅[论文](https://arxiv.org/abs/2504.12597)以获取更多细节。
> 本仓库为GeoSense的评估代码仓库,遵循MIT开源协议。
## 💫 简介
为全面评估多模态大语言模型的推理能力,我们提出**GeoSense**,其包含覆盖148条几何原理(定义、定理与公式)的1789道高质量题目,题型涵盖平面几何至立体几何范畴。具体而言,我们提出的GeoSense具有以下核心特性:
* **五级层级化几何原理框架**:GeoSense构建了包含148条几何原理的五层知识层级体系,覆盖平面几何与立体几何中的65条定义、47条定理及36条计算公式,可实现对模型在几何问题场景下识别与应用知识能力的多维度、细粒度评估。
* 🍀**精细化标注数据集**:GeoSense共收录1789道几何题目,并为求解这些题目所需的5556条几何原理提供了详细的双语标注,包括其与几何图形中元素的对应关系及应用方式。我们使用特殊标记(<note>)标注解题关键节点,以确保对模型的评估全面且准确。GeoSense的构建流程严谨规范,共有23名几何领域的研究生参与数据标注、审核与质量管控工作。
* ⚡**创新性评估策略**:GeoSense采用创新的评估方法,引入两项全新指标:几何原理识别度(GPI, Geometric Principles Identification)与几何原理应用度(GPA, Geometric Principles Application)。这两项指标专注于评估模型在复杂视觉场景中识别并应用几何原理的能力,有助于识别模型推理过程中存在的潜在缺陷与改进方向。
基于GeoSense,我们已完成对多模态大语言模型推理能力的全面评估,并维护了一份完整的排行榜单。
## 📊 排行榜
请访问我们的[官网](https://gfzshiwai.github.io/GeoSense_Project/#leaderboard)
## ⚖️ 评估
请访问我们的[GitHub仓库](https://github.com/OpenStellarTeam/GeoSense/tree/main)
## 引用
若您使用本数据集,请引用我们的论文。
@misc{xu2025geosenseevaluatingidentificationapplication,
title={GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning},
author={Liangyu Xu and Yingxiu Zhao and Jingyun Wang and Yingyao Wang and Bu Pi and Chen Wang and Mingliang Zhang and Jihao Gu and Xiang Li and Xiaoyong Zhu and Jun Song and Bo Zheng},
year={2025},
eprint={2504.12597},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.12597},
}
提供机构:
maas
创建时间:
2025-05-10
搜集汇总
数据集介绍

背景与挑战
背景概述
GeoSense是一个专注于几何推理评估的双语基准数据集,包含1,789个几何问题,涵盖148个几何原理,并采用五级层次框架和创新评估策略(GPI和GPA指标)进行多维度的模型能力评估。
以上内容由遇见数据集搜集并总结生成



