DiffLean/SFT_Dataset
收藏Hugging Face2026-04-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/DiffLean/SFT_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
DiffLean SFT数据集是一个用于在Lean 4中监督微调形式定理证明器的合成数据集。该数据集包含656,160个数学问题的示例,每个示例都配有自然语言问题、形式化陈述和经过验证的Lean 4形式化证明。数据集涵盖多个配置(如Reasoning、processed、processed_hard、rl_easy_20k),每个配置具有不同的特征,例如问题类型、来源、难度标签(在processed_hard中,合成行标记为easy,现有DiffLean非合成行标记为medium,附加的BigMath行标记为hard)以及RL保留桶(在rl_easy_20k中)。数据集主要用于形式定理证明和数学推理任务,支持文本生成和形式验证。
DiffLean SFT Dataset is a synthetic dataset for supervised fine-tuning of formal theorem provers in Lean 4. It contains 656,160 examples of mathematical problems paired with formal statements and verified formal proofs in Lean 4. The dataset includes multiple configurations (e.g., Reasoning, processed, processed_hard, rl_easy_20k) with varying features such as problem type, source, difficulty labels (in processed_hard, synthetic rows are easy, existing DiffLean non-synthetic rows are medium, appended BigMath rows are hard), and RL holdout buckets (in rl_easy_20k). It is designed for formal theorem proving and mathematical reasoning tasks, supporting text generation and formal verification.
提供机构:
DiffLean



