ChristianZ97/NuminaMath-LEAN-SATP-cleaned
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ChristianZ97/NuminaMath-LEAN-SATP-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
NuminaMath-LEAN-SATP-cleaned是从ChristianZ97/NuminaMath-LEAN-cleaned派生而来的最小SATP训练集切片。数据集仅包含两列:稳定的定理标识符(uuid)和Lean 4定理文本(formal_statement)。uuid是通过sha256(canonical(formal_statement))的前16位生成的,与ChristianZ97/SATP-Config-Buffer中的theorem_uuid列对齐。formal_statement列包含Lean 4定理文本,保留了import头信息,因为约1%的行包含非默认的import(例如Mathlib.Tactic、Aesop等)。数据集的目标是提供一个低成本分发且易于与SATP重放缓冲区结合的存储库。未来版本计划添加一个goal_state列,该列由Lean元编程生成。验证和测试集使用brando/minif2f-lean4和AI-MO/minif2f_test,这些数据在上游使用,不在此存储库中镜像。
NuminaMath-LEAN-SATP-cleaned is a minimal SATP training-set slice derived from ChristianZ97/NuminaMath-LEAN-cleaned. It contains only two columns — a stable theorem identifier (uuid) and the Lean 4 theorem text (formal_statement). The uuid is generated as the first 16 hex digits of sha256(canonical(formal_statement)), aligning with the theorem_uuid column in ChristianZ97/SATP-Config-Buffer. The formal_statement column contains the Lean 4 theorem text with the import header kept embedded because ~1% of rows carry non-default imports (e.g., Mathlib.Tactic, Aesop). The goal of this dataset is to provide a repository that is cheap to distribute and straightforward to join with the SATP replay buffer. A future revision will add a goal_state column produced by Lean metaprogramming. For validation and test sets, the SATP pipeline uses brando/minif2f-lean4 and AI-MO/minif2f_test, which are consumed upstream without mirroring here.
提供机构:
ChristianZ97



