Guji-Math: An Evaluation Benchmark for Assessing the Effectiveness of Reasoning Models in Solving Ancient Chinese Mathematical Problems

Name: Guji-Math: An Evaluation Benchmark for Assessing the Effectiveness of Reasoning Models in Solving Ancient Chinese Mathematical Problems
Creator: Nanjing Agricultural University
Published: 2025-06-13 00:00:00
License: 暂无描述

科学数据银行2025-06-13 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=d5a0bfefae1b4a8a92cb1a4fbc316949

下载链接

链接失效反馈

官方服务：

资源简介：

As one of the earliest countries in the world to develop mathematics, China accumulated a vast wealth of valuable mathematical resources throughout its long history. To promote the revitalization and utilization of ancient Chinese mathematical resources in the era of generative AI, this study designs Guji-Math, a specialized benchmark for evaluating ancient mathematical problems tailored to assess reasoning models. Guji-Math is constructed from the Suanjing Shishu (《算经十书》), the most renowned compendium of ancient Chinese mathematics. Leveraging the unique "Question-Answer-Solution" textual structure, the benchmark creates verifiable question-answer pairs. Through semi-automatic annotation, each problem is assigned one of four difficulty levels and one of 15 problem types, resulting in a collection of 538 mathematical questions and 511 solution methods. The benchmark provides two evaluation modes—open-book and closed-book—to assess reasoning models' accuracy in solving problems either without external assistance or by referencing only the original solution methods from the texts.

提供机构：

Nanjing Agricultural University

创建时间：

2025-06-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集