five

Guji-Math: An Evaluation Benchmark for Assessing the Effectiveness of Reasoning Models in Solving Ancient Chinese Mathematical Problems

收藏
科学数据银行2025-06-13 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=d5a0bfefae1b4a8a92cb1a4fbc316949
下载链接
链接失效反馈
官方服务:
资源简介:
As one of the earliest countries in the world to develop mathematics, China accumulated a vast wealth of valuable mathematical resources throughout its long history. To promote the revitalization and utilization of ancient Chinese mathematical resources in the era of generative AI, this study designs Guji-Math, a specialized benchmark for evaluating ancient mathematical problems tailored to assess reasoning models. Guji-Math is constructed from the Suanjing Shishu (《算经十书》), the most renowned compendium of ancient Chinese mathematics. Leveraging the unique "Question-Answer-Solution" textual structure, the benchmark creates verifiable question-answer pairs. Through semi-automatic annotation, each problem is assigned one of four difficulty levels and one of 15 problem types, resulting in a collection of 538 mathematical questions and 511 solution methods. The benchmark provides two evaluation modes—open-book and closed-book—to assess reasoning models' accuracy in solving problems either without external assistance or by referencing only the original solution methods from the texts.
提供机构:
Nanjing Agricultural University
创建时间:
2025-06-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作