five

AlgGeoTest

收藏
魔搭社区2025-12-05 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/PKU-DS-LAB/AlgGeoTest
下载链接
链接失效反馈
官方服务:
资源简介:
# Welcome to AlgGeoTest created by PKU-DS-LAB! ## Citation Information Paper Link: https://arxiv.org/abs/2508.02208 ## Dataset Description AlgGeoTest is the first benchmark specifically designed to evaluate LLMs' comprehension of Algebraic Geometry—a frontier domain of modern mathematics that occupies a central position within the contemporary mathematical landscape. AlgGeoTest was created by implementing [Proof2Hybrid](https://arxiv.org/abs/2508.02208)—the first fully-automated framework for mathematical benchmark synthesis—on [The Stacks project](https://stacks.math.columbia.edu/), an open source Algebraic Geometry textbook and reference work. Key characteristics of ScholarSearch include: - **Proof Centric**: Questions are based on mathematical proofs or definitions, which are typically regarded as hard to verdict. - **Fully Automated**: AlgGeoTest was created using the fully-automated framework Proof2Hybrid, guaranteeing its scalability on size. - **Natural Language**: AlgGeoTest fully stems from natural mathematical corpora, with no need of any human annotation. AlgGeoTest consists of 456 carefully designed questions in english. Each question contains 6 options, four of which are mathematically correct and the other two are mathematically incorrect. Questions of AlgGeoTest are designed to be hybrid-formatted, which prohibits LLMs from hacking the correct answer by comparing between different options, and relieves the possible bias of different standards of mathematical correctness between different LLMs. ## Dataset Structure Each entry in the benchmark contains six fields named A, B, C, D, E and F—standing for the six options—and a field named answer—standing for the correct answer. Each option is a dictionary containing the following fields: - **tag**: The tag of The Stacks project this option originates from. - **type**: The type of this option, could be definition or proposition(which stands for proposition-proof pairs). - **proposition** The proposition of the proposition-proof pair. Only exist if the option is of type proposition. - **text**: The definition if the option is of type definition, or the proof of the proposition-proof pair if the option is of type proposition. - **ground_truth**: Whether this option is mathematically correct or mathematically incorrect. The benchmark is provided as a JSONL file with each line being an entry. ## Experiment Result ![Evaluation Results of AlgGeoTest](figures/eval_results.jpg)

# 欢迎使用由北京大学数据科学实验室(PKU-DS-LAB)创建的AlgGeoTest! ## 引用信息 论文链接:https://arxiv.org/abs/2508.02208 ## 数据集概览 AlgGeoTest是首个专为评估大语言模型(LLM)对代数几何的理解而设计的基准测试集。代数几何是现代数学的前沿领域,在当代数学版图中占据核心地位。 该数据集通过将[Proof2Hybrid](https://arxiv.org/abs/2508.02208)——首个完全自动化的数学基准合成框架——应用于[Stacks项目](https://stacks.math.columbia.edu/)(一个开源的代数几何教材与参考著作)构建而成。 AlgGeoTest的核心特征包括: - **以证明为核心**:所有问题均基于数学证明或定义,此类问题通常被认为难以判定正误。 - **完全自动化**:AlgGeoTest通过完全自动化框架Proof2Hybrid构建,可保证其规模可扩展性。 - **自然语言形式**:AlgGeoTest完全源自自然数学语料库,无需任何人工标注。 AlgGeoTest包含456道精心设计的英文问题。每道问题设有6个选项,其中4个为数学正确选项,剩余2个为数学错误选项。该数据集的问题采用混合格式设计,可防止大语言模型通过对比不同选项投机取巧获取正确答案,同时缓解不同大语言模型间数学正确性判定标准可能存在的偏差。 ## 数据集结构 该基准测试集的每个条目包含六个分别命名为A、B、C、D、E、F的字段(对应六个选项),以及一个名为answer的字段(代表正确答案)。每个选项均为一个字典,包含以下字段: - **tag**:该选项源自的Stacks项目标签。 - **type**:该选项的类型,可为definition(定义)或proposition(命题-证明配对类型)。 - **proposition**:命题-证明对中的命题。仅当选项类型为proposition时存在该字段。 - **text**:若选项类型为definition,则为对应定义;若选项类型为proposition,则为命题-证明对中的证明内容。 - **ground_truth**:该选项的数学正误情况。 该基准测试集以JSONL格式提供,每一行对应一个条目。 ## 实验结果 ![AlgGeoTest评估结果](figures/eval_results.jpg)
提供机构:
maas
创建时间:
2025-08-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作