Name: DiffLean/DPO_Dataset
Creator: DiffLean
Published: 2026-04-13 14:31:49
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/DiffLean/DPO_Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en task_categories: - text-generation tags: - dpo - preference-dataset - formal-reasoning - lean - math dataset_info: features: - name: Question dtype: string - name: Source dtype: string - name: Formal Statement dtype: string - name: Problem Type dtype: string - name: Chosen Formal Proof dtype: string - name: Chosen Reasoning dtype: string - name: Rejected Formal Proof dtype: string - name: Rejected Reasoning dtype: string splits: - name: train num_bytes: 6430157648 num_examples: 128595 download_size: 1442082212 dataset_size: 6430157648 configs: - config_name: default data_files: - split: train path: data/train-* --- # DPO_Dataset ## Dataset Description This repository contains the merged DiffLean DPO preference dataset used for formal reasoning and Lean proof generation experiments. Each training row includes a problem statement, its formal Lean statement, and a chosen/rejected pair of natural-language reasoning traces and formal proofs. ## Data Sources This upload merges the following local files: - `dpo_dataset/paired_pass_fail_dataset.json` - `dpo_dataset/formal_proof_verified_paired_pass_fail_dataset.json` ## Dataset Structure - Split: `train` - Number of rows: `128595` - Number of columns: `8` ### Fields - `Question`: Natural-language math problem statement, including the final answer text. - `Source`: Dataset/source identifier for the example. - `Formal Statement`: Lean theorem statement associated with the problem. - `Problem Type`: Problem category label carried over from the source dataset. - `Chosen Formal Proof`: Preferred Lean proof/program for the statement. - `Chosen Reasoning`: Preferred natural-language reasoning paired with the chosen proof. - `Rejected Formal Proof`: Rejected Lean proof/program for the same statement. - `Rejected Reasoning`: Rejected natural-language reasoning paired with the rejected proof. ### Source Composition - `dart-math-hard`: 61113 - `synthetic`: 54107 - `openr1`: 8755 - `deepscaler`: 4620 ## Intended Use The dataset is intended for preference modeling, reward modeling, or DPO-style training on mathematical reasoning and formal proof generation tasks. ## Limitations The examples are aggregated from generated and filtered pipelines and may still contain noise, formatting issues, or imperfect preference labels. Chosen responses should not be treated as guaranteed ground truth, and rejected responses are not necessarily universally incorrect. ## Ethical Considerations This dataset contains model-generated mathematical reasoning and formal proofs. Please validate downstream outputs before using them in high-stakes settings, especially when correctness matters.

应用场景：