five

CombiBench

收藏
魔搭社区2025-12-04 更新2025-05-03 收录
下载链接:
https://modelscope.cn/datasets/AI-MO/CombiBench
下载链接
链接失效反馈
官方服务:
资源简介:
# CombiBench <p align="center"> <a href="https://github.com/MoonshotAI/CombiBench/"><img src="https://img.shields.io/badge/🚀-github-black"</a> </p> ## CombiBench is the first benchmark focused on combinatorial problems, based on the formal language Lean 4. CombiBench is a manually produced benchmark, including 100 combinatorial mathematics problems of varying difficulty and knowledge levels. It aims to provide a benchmark for evaluating the combinatorial mathematics capabilities of automated theorem proving systems to advance the field. For problems that require providing a solution first and then proving its correctness, we have referred to the style of [PutnamBench](https://github.com/trishullab/PutnamBench). We are hosting a [**leaderboard**](https://moonshotai.github.io/CombiBench/leaderboard.html) and will readily receive evaluation results which are accompanied by a preprint or publication. Please reach out privately at `liujunqi@amss.ac.cn` with any requests for additions to the leaderboard. ## Statistics We collected all combinatorics problems from the official IMO problems since 2000, except for one problem that relies on a figure. And We selected problems through random sampling from 14 chapters in the book, choosing 3 problems from each chapter, ensuring that the 42 problems are evenly distributed across all 14 chapters. We randomly selected 10 simple combinatorics problems at the middle school level from a mathematics problem collection website [hackmath](https://www.hackmath.net/). Then, we randomly collected 12 problems from other mathematics competitions. | Source | Count | | ---------------- | -------------- | | Hackmath | 10 | | Brualdi's book | 42 | | IMO | 36 | | APMO | 2 | | Balticway | 1 | | EGMO | 1 | | IMO-Shortlist | 4 | | IZHO | 2 | | BXMO | 1 | | USAMO | 1 | Note : The complete proofs of Problem 3 and Problem 5 from IMO 2024 have already been formalized in [mathlib4/Archive/Imo2024Q3](https://leanprover-community.github.io/mathlib4_docs/Archive/Imo/Imo2024Q3.html) and [mathlib4/Archive/Imo2024Q5](https://leanprover-community.github.io/mathlib4_docs/Archive/Imo/Imo2024Q5.html). Therefore, we directly refer to the statements of these problems, along with the necessary definitions used in the statements. We are very grateful to Joseph Myers, the author of these two problems. We also appreciate his suggestions on the formalization of our problems. ## Evaluation Our evaluation code is released at [https://github.com/MoonshotAI/CombiBench](https://github.com/MoonshotAI/CombiBench) ## 🙌 Contributing Contributions are welcome! If anyone notices any mistakes, please raise an issue on the repository and we will address it. ## 📝 License This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for full details.

# CombiBench <p align="center"> <a href="https://github.com/MoonshotAI/CombiBench/"><img src="https://img.shields.io/badge/🚀-github-black"</a> </p> ## CombiBench是首个基于形式化语言Lean 4的组合问题基准测试集。该基准集由人工构建,包含100道难度与知识层级各异的组合数学问题,旨在为评估自动定理证明系统的组合数学能力提供评测基准,以推动该领域的发展。对于需要先给出解答再证明其正确性的问题,我们参考了[PutnamBench](https://github.com/trishullab/PutnamBench)的范式。 我们搭建了[**排行榜**](https://moonshotai.github.io/CombiBench/leaderboard.html),并乐于接收附带预印本或已发表论文的评测结果。如需向排行榜新增条目,请通过`liujunqi@amss.ac.cn`私信联系。 ## 统计信息 我们收集了2000年以来官方发布的国际数学奥林匹克(International Mathematical Olympiad,IMO)试题中的全部组合数学问题,仅排除一道依赖配图的题目。此外,我们从布鲁阿尔迪专著的14个章节中通过随机抽样选取题目,每个章节选取3道,确保42道题目均匀分布于全部14个章节。我们从数学题库网站[hackmath](https://www.hackmath.net/)中随机选取了10道初中水平的简单组合数学问题。随后,我们从其他数学竞赛中随机收集了12道题目。 | 来源 | 数量 | | ---------------- | -------------- | | Hackmath | 10 | | 布鲁阿尔迪专著 | 42 | | IMO | 36 | | APMO | 2 | | Balticway | 1 | | EGMO | 1 | | IMO-Shortlist | 4 | | IZHO | 2 | | BXMO | 1 | | USAMO | 1 | 注意:2024年国际数学奥林匹克第3题和第5题的完整证明已在[mathlib4/Archive/Imo2024Q3](https://leanprover-community.github.io/mathlib4_docs/Archive/Imo/Imo2024Q3.html)和[mathlib4/Archive/Imo2024Q5](https://leanprover-community.github.io/mathlib4_docs/Archive/Imo/Imo2024Q5.html)中完成形式化。因此,我们直接采用了这两道题的题干以及题干中用到的必要定义。我们十分感谢这两道题的作者约瑟夫·迈尔斯(Joseph Myers),同时也感谢他对我们题目形式化工作提出的建议。 ## 评测 我们的评测代码已发布于[https://github.com/MoonshotAI/CombiBench](https://github.com/MoonshotAI/CombiBench) ## 🙌 贡献 欢迎贡献!若您发现任何错误,请在仓库中提交议题,我们将及时处理。 ## 📝 许可证 本项目采用MIT许可证授权,完整细节请参见[LICENSE](./LICENSE)文件。
提供机构:
maas
创建时间:
2025-04-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作