stepfun-ai/StepFun-Formalizer-Training
收藏Hugging Face2025-10-13 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/stepfun-ai/StepFun-Formalizer-Training
下载链接
链接失效反馈官方服务:
资源简介:
StepFun-Formalizer数据集包含两个部分:NuminaMath-Formal-SFT-183K和StepFun-Formalizer-RL-6K。NuminaMath-Formal-SFT-183K是由Kimina-Autoformalizer-7B将NuminaMath-1.5中的非正式-正式问题对进行翻译得到,用于补充模型在正式语言领域的知识。StepFun-Formalizer-RL-6K则是收集的非正式数学问题及其对应的由人类标注或模型生成并人工审核的正式语句,作为强化学习数据,包括MiniF2F、ProofNet、PutnamBench、Compfiles和FormalMATH(不含Lite)。所有正式语句都通过了Lean4 v4.15.0的语法检查,所有训练数据都进行了13-gram的去污染处理。
The StepFun-Formalizer dataset consists of two parts: NuminaMath-Formal-SFT-183K and StepFun-Formalizer-RL-6K. NuminaMath-Formal-SFT-183K is translated from informal-formal problem pairs in NuminaMath-1.5 by Kimina-Autoformalizer-7B, used to supplement the models domain knowledge in formal language. StepFun-Formalizer-RL-6K is a collection of informal math problems paired with human-annotated (or model-generated and manually reviewed) formal statements as reinforcement learning data, including MiniF2F, ProofNet, PutnamBench, Compfiles, and FormalMATH (without Lite). All formal statements pass the syntax check of Lean4 v4.15.0, and all training data have undergone 13-gram decontamination against benchmarks.
提供机构:
stepfun-ai



