five

Goedel-Pset-v1

收藏
魔搭社区2025-12-05 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/Goedel-LM/Goedel-Pset-v1
下载链接
链接失效反馈
官方服务:
资源简介:
<hr> <div align="center" style="line-height: 1;"> <a href="https://goedel-lm.github.io/" target="_blank" style="margin: 2px;"> <img alt="Homepage" src="https://img.shields.io/badge/%F0%9F%A4%96%20Homepage-Goedel-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/Goedel-LM/Goedel-Prover" target="_blank" style="margin: 2px;"> <img alt="Github" src="https://img.shields.io/badge/GitHub-Goedel-blue?style=flat-square&logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <!-- <a href="https://huggingface.co/Goedel-LM" target="_blank" style="margin: 2px;"> <img alt="HuggingFace" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20face-Goedel-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> --> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/Goedel-LM/Goedel-Prover/blob/main/LICENSE" style="margin: 2px;"> <img alt="License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> <a href="" style="margin: 2px;"> <img alt="License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> <p align="center"> <a href="#7-license">License</a> | <a href="#8-citation">Citation</a> | <a href="#9-contact">Contact</a> </p> We present Goedel-Pset, the largest open-source dataset of Lean 4 statements, consisting of 1.73 million formalized statements. This dataset is ten times larger than the [Lean Workbook](https://arxiv.org/abs/2406.03847). <p align="center"> <img width="40%" src="dataset_plot.png"> </p> To create this dataset, we trained two formalizers to convert problems from Numina into Lean 4. Each statement undergoes two tests to check its quality. First, we conduct a Compiling Correctness (CC) Test, which verifies that the formalized statement adheres to Lean syntax and successfully compiles, utilizing a placeholder “:= by sorry” for the potential proof. This step is essential for assessing the syntactical correctness of the statements. Second, we implement a Faithfulness and Completeness (FC) Test to ensure that each formalized statement accurately reflects the original informal problem. This test examines whether all assumptions, conditions, and implicit definitions are appropriately incorporated. For the FC test, we use Qwen2.5-72BInstruct. Check our arxiv paper for [details](https://arxiv.org/pdf/2502.07640). Despite using only supervised fine-tuning with expert iterating on this dataset, our Goedel-Prover-SFT (fine-tuned on DeepSeek-Prover-V1.5-base) significantly outperforms the previous best open-source model, DeepSeek-Prover-V1.5, which uses reinforcement learning. <div align="center"> | Model |Compute (Pass)| miniF2F-test | |------------------------|------------------|------------------| | TheoremLamma | 128 | 33.6% | | DeepSeek-Prover-V1 | 32 | 46.1% | | DeepSeek-Prover-V1.5-SFT | 32 | 48.2% | | DeepSeek-Prover-V1.5-RL | 32 | 50.0% | | **Goedel-Prover-SFT** | **32** | **57.6%** | |------------------------|------------------|------------------| | DeepSeek-Prover-V1.5-SFT | 3200 | 53.3% | | DeepSeek-Prover-V1.5-RL | 3200 | 54.9% | | **Goedel-Prover-SFT** | **3200** | **62.7%** | |------------------------|------------------|------------------| | DeepSeek-Prover-V1.5-SFT | 25600 | 55.8% | | DeepSeek-Prover-V1.5-RL | 25600 | 58.5% | | **Goedel-Prover-SFT** | **25600** | **64.7%** | </div> ## Citation ```latex @misc{lin2025goedelproverfrontiermodelopensource, title={Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving}, author={Yong Lin and Shange Tang and Bohan Lyu and Jiayun Wu and Hongzhou Lin and Kaiyu Yang and Jia Li and Mengzhou Xia and Danqi Chen and Sanjeev Arora and Chi Jin}, year={2025}, eprint={2502.07640}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2502.07640}, } ```

<div align="center" style="line-height: 1;"> <a href="https://goedel-lm.github.io/" target="_blank" style="margin: 2px;"> <img alt="Homepage" src="https://img.shields.io/badge/%F0%9F%A4%96%20Homepage-Goedel-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/Goedel-LM/Goedel-Prover" target="_blank" style="margin: 2px;"> <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Goedel-blue?style=flat-square&logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <!-- <a href="https://huggingface.co/Goedel-LM" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20face-Goedel-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> --> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/Goedel-LM/Goedel-Prover/blob/main/LICENSE" style="margin: 2px;"> <img alt="许可证" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> <a href="" style="margin: 2px;"> <img alt="许可证" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> <p align="center"> <a href="#7-license">许可证</a> | <a href="#8-citation">引用</a> | <a href="#9-contact">联系方式</a> </p> 我们推出了Goedel-Pset,这是目前规模最大的开源Lean 4形式化语句数据集,包含173万条形式化语句,其规模是[Lean Workbook](https://arxiv.org/abs/2406.03847)的十倍。 <p align="center"> <img width="40%" src="dataset_plot.png"> </p> 为构建该数据集,我们训练了两个形式化工具,将Numina中的问题转换为Lean 4格式。每条语句均经过两项质量检验:其一为编译正确性(Compiling Correctness, CC)测试,验证形式化语句符合Lean语法且可成功编译,我们使用占位符":= by sorry"作为待填充的证明,该步骤用于评估语句的语法正确性;其二为忠实性与完备性(Faithfulness and Completeness, FC)测试,确保每条形式化语句准确反映原始非形式化问题,该测试会检查所有假设、条件与隐式定义是否被恰当纳入。FC测试采用Qwen2.5-72BInstruct模型完成。详细信息可查阅我们的arXiv论文[详情](https://arxiv.org/pdf/2502.07640)。 尽管仅采用基于该数据集的专家迭代监督微调,我们的Goedel-Prover-SFT(基于DeepSeek-Prover-V1.5-base微调)仍显著优于此前最优的开源模型DeepSeek-Prover-V1.5(后者采用强化学习训练)。 <div align="center"> | 模型 | 计算量(Pass) | miniF2F测试集准确率 | |------------------------|------------------|------------------| | TheoremLamma | 128 | 33.6% | | DeepSeek-Prover-V1 | 32 | 46.1% | | DeepSeek-Prover-V1.5-SFT | 32 | 48.2% | | DeepSeek-Prover-V1.5-RL | 32 | 50.0% | | **Goedel-Prover-SFT** | **32** | **57.6%** | |------------------------|------------------|------------------| | DeepSeek-Prover-V1.5-SFT | 3200 | 53.3% | | DeepSeek-Prover-V1.5-RL | 3200 | 54.9% | | **Goedel-Prover-SFT** | **3200** | **62.7%** | |------------------------|------------------|------------------| | DeepSeek-Prover-V1.5-SFT | 25600 | 55.8% | | DeepSeek-Prover-V1.5-RL | 25600 | 58.5% | | **Goedel-Prover-SFT** | **25600** | **64.7%** | </div> ## 引用 latex @misc{lin2025goedelproverfrontiermodelopensource, title={Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving}, author={Yong Lin and Shange Tang and Bohan Lyu and Jiayun Wu and Hongzhou Lin and Kaiyu Yang and Jia Li and Mengzhou Xia and Danqi Chen and Sanjeev Arora and Chi Jin}, year={2025}, eprint={2502.07640}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2502.07640}, }
提供机构:
maas
创建时间:
2025-07-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作