five

nvidia/Nemotron-Math-Proofs-v1

收藏
Hugging Face2025-12-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nvidia/Nemotron-Math-Proofs-v1
下载链接
链接失效反馈
官方服务:
资源简介:
Nemotron-Math-Proofs-v1是一个大规模数学推理数据集,包含约58万自然语言证明问题、约55万Lean 4定理陈述的形式化,以及约90万模型生成的推理轨迹,最终形成Lean 4证明。该数据集整合了人类编写的问题与系统生成的形式化和解决方案轨迹。每个自然语言问题通过gpt-oss-120b形式化为Lean 4定理陈述,并包含证明的占位符。推理轨迹和证明由Goedel-Prover-V2-32B尝试生成,并通过Lean 4编译器验证。数据集适用于商业用途,旨在训练LLMs进行结构化数学推理,特别是生成Lean 4形式化证明,以及构建长上下文或多轨迹推理系统。

Nemotron-Math-Proofs-v1 is a large-scale mathematical reasoning dataset containing ~580k natural language proof problems, ~550k formalizations into theorem statements in Lean 4, and ~900k model-generated reasoning trajectories culminating in Lean 4 proofs. The dataset integrates human-authored problems with systematically generated formalizations and solution traces. Each natural language problem is formalized by gpt-oss-120b into Lean 4 theorem statements with placeholders for the proof. Reasoning traces and proofs are attempted by Goedel-Prover-V2-32B and verified by the Lean 4 compiler. This dataset is ready for commercial use and intended for training LLMs to perform structured mathematical reasoning, in particular generation of Lean 4 formal proofs, and building long-context or multi-trajectory reasoning systems.
提供机构:
nvidia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作