PKU-DS-LAB/AlgGeoTest
收藏Hugging Face2025-08-05 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/PKU-DS-LAB/AlgGeoTest
下载链接
链接失效反馈官方服务:
资源简介:
AlgGeoTest是一个专为评估大型语言模型(LLM)对代数几何领域理解的基准数据集。代数几何是现代数学的前沿领域,在当代数学景观中占据中心位置。AlgGeoTest通过在开源代数几何教材和参考作品《The Stacks project》上实施Proof2Hybrid框架(一种完全自动化的数学基准合成框架)而创建。该数据集的主要特点包括:以证明为中心,问题基于数学证明或定义;完全自动化,使用Proof2Hybrid框架保证可扩展性;采用自然语言,无需人工注释。AlgGeoTest包含456个精心设计的英文问题,每个问题有6个选项,其中4个数学上是正确的,另外两个是错误的。这些问题设计为混合格式,以防止LLM通过比较不同选项来猜测正确答案,并减轻不同LLM之间数学正确性标准可能存在的偏见。
AlgGeoTest is a benchmark specifically designed to evaluate LLMs comprehension of Algebraic Geometry—a frontier domain of modern mathematics that occupies a central position within the contemporary mathematical landscape. AlgGeoTest was created by implementing Proof2Hybrid—the first fully-automated framework for mathematical benchmark synthesis—on The Stacks project, an open source Algebraic Geometry textbook and reference work. The key characteristics of AlgGeoTest include: Proof Centric, questions are based on mathematical proofs or definitions, which are typically regarded as hard to verdict; Fully Automated, AlgGeoTest was created using the fully-automated framework Proof2Hybrid, guaranteeing its scalability on size; Natural Language, AlgGeoTest fully stems from natural mathematical corpora, with no need of any human annotation. AlgGeoTest consists of 456 carefully designed questions in English, each with 6 options, four of which are mathematically correct and the other two are mathematically incorrect. Questions in AlgGeoTest are designed to be hybrid-formatted, which prohibits LLMs from hacking the correct answer by comparing between different options, and relieves the possible bias of different standards of mathematical correctness between different LLMs.
提供机构:
PKU-DS-LAB



