AlgGeoTest
收藏魔搭社区2025-12-05 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/PKU-DS-LAB/AlgGeoTest
下载链接
链接失效反馈官方服务:
资源简介:
# Welcome to AlgGeoTest created by PKU-DS-LAB!
## Citation Information
Paper Link: https://arxiv.org/abs/2508.02208
## Dataset Description
AlgGeoTest is the first benchmark specifically designed to evaluate LLMs' comprehension of Algebraic Geometry—a frontier domain of modern mathematics that occupies a central position within the contemporary mathematical landscape.
AlgGeoTest was created by implementing [Proof2Hybrid](https://arxiv.org/abs/2508.02208)—the first fully-automated framework for mathematical benchmark synthesis—on [The Stacks project](https://stacks.math.columbia.edu/), an open source Algebraic Geometry textbook and reference work.
Key characteristics of ScholarSearch include:
- **Proof Centric**: Questions are based on mathematical proofs or definitions, which are typically regarded as hard to verdict.
- **Fully Automated**: AlgGeoTest was created using the fully-automated framework Proof2Hybrid, guaranteeing its scalability on size.
- **Natural Language**: AlgGeoTest fully stems from natural mathematical corpora, with no need of any human annotation.
AlgGeoTest consists of 456 carefully designed questions in english. Each question contains 6 options, four of which are mathematically correct and the other two are mathematically incorrect. Questions of AlgGeoTest are designed to be hybrid-formatted, which prohibits LLMs from hacking the correct answer by comparing between different options, and relieves the possible bias of different standards of mathematical correctness between different LLMs.
## Dataset Structure
Each entry in the benchmark contains six fields named A, B, C, D, E and F—standing for the six options—and a field named answer—standing for the correct answer. Each option is a dictionary containing the following fields:
- **tag**: The tag of The Stacks project this option originates from.
- **type**: The type of this option, could be definition or proposition(which stands for proposition-proof pairs).
- **proposition** The proposition of the proposition-proof pair. Only exist if the option is of type proposition.
- **text**: The definition if the option is of type definition, or the proof of the proposition-proof pair if the option is of type proposition.
- **ground_truth**: Whether this option is mathematically correct or mathematically incorrect.
The benchmark is provided as a JSONL file with each line being an entry.
## Experiment Result

# 欢迎使用由北京大学数据科学实验室(PKU-DS-LAB)创建的AlgGeoTest!
## 引用信息
论文链接:https://arxiv.org/abs/2508.02208
## 数据集概览
AlgGeoTest是首个专为评估大语言模型(LLM)对代数几何的理解而设计的基准测试集。代数几何是现代数学的前沿领域,在当代数学版图中占据核心地位。
该数据集通过将[Proof2Hybrid](https://arxiv.org/abs/2508.02208)——首个完全自动化的数学基准合成框架——应用于[Stacks项目](https://stacks.math.columbia.edu/)(一个开源的代数几何教材与参考著作)构建而成。
AlgGeoTest的核心特征包括:
- **以证明为核心**:所有问题均基于数学证明或定义,此类问题通常被认为难以判定正误。
- **完全自动化**:AlgGeoTest通过完全自动化框架Proof2Hybrid构建,可保证其规模可扩展性。
- **自然语言形式**:AlgGeoTest完全源自自然数学语料库,无需任何人工标注。
AlgGeoTest包含456道精心设计的英文问题。每道问题设有6个选项,其中4个为数学正确选项,剩余2个为数学错误选项。该数据集的问题采用混合格式设计,可防止大语言模型通过对比不同选项投机取巧获取正确答案,同时缓解不同大语言模型间数学正确性判定标准可能存在的偏差。
## 数据集结构
该基准测试集的每个条目包含六个分别命名为A、B、C、D、E、F的字段(对应六个选项),以及一个名为answer的字段(代表正确答案)。每个选项均为一个字典,包含以下字段:
- **tag**:该选项源自的Stacks项目标签。
- **type**:该选项的类型,可为definition(定义)或proposition(命题-证明配对类型)。
- **proposition**:命题-证明对中的命题。仅当选项类型为proposition时存在该字段。
- **text**:若选项类型为definition,则为对应定义;若选项类型为proposition,则为命题-证明对中的证明内容。
- **ground_truth**:该选项的数学正误情况。
该基准测试集以JSONL格式提供,每一行对应一个条目。
## 实验结果

提供机构:
maas
创建时间:
2025-08-06



