ChristianZ97/SATP-Config-Buffer
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ChristianZ97/SATP-Config-Buffer
下载链接
链接失效反馈官方服务:
资源简介:
SATP-Config-Buffer数据集是在SATP(Steering Aesop for Theorem Proving)训练过程中收集的Aesop策略配置集合。该数据集用作训练SATP-aesop-policy模型的正负回放缓冲区。每条数据包含一个定理和一个Aesop配置对,并带有奖励标签:+1.0表示配置成功证明了定理(正例/成功),-1.0表示配置未能证明定理(负例/扰动)。数据集包含以下列:theorem_uuid(定理的唯一标识符)、config_uuid(配置的唯一标识符)、formal_statement(Lean 4定理陈述,包括`import Mathlib`头)、tactic_string(完整的Aesop配置块)、reward(奖励值)、lemma_names(检索到的前提全名列表)和lemma_scores(检索相似度分数列表)。数据集的定理来源于NuminaMath-LEAN数据集,而本数据集仅贡献了SATP训练过程中发现的每个定理的Aesop配置。
The SATP-Config-Buffer dataset is a collection of Aesop tactic configurations gathered during SATP (Steering Aesop for Theorem Proving) training. This dataset serves as the positive and negative replay buffer for training the SATP-aesop-policy model. Each row consists of a (theorem, aesop config) pair labeled with a reward: +1.0 indicates the config successfully proved the theorem (positive/success), while -1.0 indicates it failed (negative/perturbation). The dataset includes columns such as theorem_uuid (unique identifier for the theorem), config_uuid (unique identifier for the config), formal_statement (Lean 4 theorem statement including `import Mathlib` header), tactic_string (full Aesop config block), reward (reward value), lemma_names (list of top-K retrieved premise full names), and lemma_scores (retrieval similarity scores). The theorems are sourced from the NuminaMath-LEAN dataset, and this dataset specifically provides the Aesop configurations discovered during SATP training.
提供机构:
ChristianZ97



