five

lm-provers/FineProofs-SFT

收藏
Hugging Face2026-02-14 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/lm-provers/FineProofs-SFT
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - question-answering language: - en tags: - math - reasoning - olympiad - proof-generation - chain-of-thought pretty_name: FineProofs SFT Dataset size_categories: - 1K<n<10K dataset_info: - config_name: all features: - name: problem dtype: string - name: reasoning_content dtype: string - name: proof dtype: string - name: category dtype: string - name: competition dtype: string - name: gemini-3-pro-grade dtype: int64 - name: qwen3-4b-thinking-reward@128 dtype: float64 - name: source dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 1261177070 num_examples: 7777 download_size: 559037559 dataset_size: 1261177070 - config_name: default features: - name: problem dtype: string - name: reasoning_content dtype: string - name: proof dtype: string - name: category dtype: string - name: competition dtype: string - name: gemini-3-pro-grade dtype: int64 - name: qwen3-4b-thinking-reward@128 dtype: float64 - name: source dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 694239300 num_examples: 4281 download_size: 346092746 dataset_size: 694239300 configs: - config_name: all data_files: - split: train path: all/train-* - config_name: default data_files: - split: train path: data/train-* --- # FineProofs SFT ## Dataset Description **FineProofs SFT** is a high-quality supervised fine-tuning dataset containing mathematical Olympiad problems paired with chain-of-thought reasoning and formal proofs distilled from [DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2). The dataset comprises **7,777 samples (4,300 unique problems)** sourced from international Olympiad competitions and Art of Problem Solving (AoPS), each annotated with: - Detailed reasoning traces (thinking content) generated by [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) - Formal mathematical proofs generated by [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) - Expert grades from Gemini-3-Pro (0-7 point scale) - Model-based difficulty/reward scores from [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) - Problem categorization and competition metadata This dataset is specifically designed for training reasoning models to generate proofs for competition-level mathematics. ### Key Features - **Chain-of-Thought Reasoning**: Each solution includes explicit reasoning traces enclosed in `<think>` tags, distilled from [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) - **Formal Proofs**: LaTeX-formatted mathematical proofs - **Dual Quality Signals**: - Expert grades from Gemini-3-Pro (continuous 0-7 scale) - Difficulty estimates from [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) reward@128 scored by [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) - **Unfiltered Teacher Distillation**: Includes both correct and incomplete solutions from teacher model to maximize learning signal for smaller models - **Diverse Sources**: Problems from IMO, APMO, USA(J)MO, and other prestigious competitions - **Rich Metadata**: Includes problem type categorization, competition information, and source attribution - **Benchmark Decontaminated**: No overlap with [IMO-ProofBench](https://huggingface.co/papers/2511.01846) or [ProofBench](https://huggingface.co/datasets/wenjiema02/ProofBench) evaluation sets ## Dataset Structure ### Data Fields | Field | Type | Description | |-------|------|-------------| | `problem` | string | The mathematical problem statement | | `reasoning_content` | string | Internal reasoning trace (extracted from `<think>` tags) | | `proof` | string | The formal mathematical proof/solution | | `category` | string | Problem type (e.g., Algebra, Combinatorics, Number Theory, Geometry) | | `competition` | string | Source competition and year (e.g., "IMO-2023", "USAMO-2022") | | `gemini-3-pro-grade` | int | Expert grade from 0-7 based on IMOProofBench rubric | | `qwen3-4b-thinking-reward@128` | float | Mean reward score from [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) model | | `source` | string | Dataset origin: "olympiads" or "aops_v2" | | `messages` | list | Full conversation format with `<think>` tags preserved | ### Data Splits The dataset is available in two configurations: | Configuration | Split | Rows | Description | |--------------|-------|------|-------------| | `default` | train | 4,300 | Unique problems only (deduplicated) | | `all` | train | 7,777 | All samples including duplicates | **Loading the dataset:** ```python from datasets import load_dataset # Load unique problems only (recommended for SFT) dataset = load_dataset("lm-provers/FineProofs-SFT", split="train") # Load all samples including duplicates dataset_all = load_dataset("lm-provers/FineProofs-SFT", "all", split="train") ``` ### Example Entry ```json { "problem": "Let $n$ be a positive integer. Prove that there exist positive integers $a$ and $b$ such that $n$ divides $a^2 + b^2$ but $n$ does not divide $a$ or $b$.", "reasoning_content": "We need to find positive integers a and b such that n divides a² + b². Let's consider the residues modulo n...", "proof": "Consider the set of residues {0, 1, 2, ..., n-1}. By the Pigeonhole Principle...\n\nTherefore, there exist positive integers a and b satisfying the required conditions.", "category": "Number Theory", "competition": "IMO-2019", "gemini-3-pro-grade": 7, "qwen3-4b-thinking-reward@128": 0.85, "source": "olympiads", "messages": [ {"role": "user", "content": "Let $n$ be a positive integer..."}, {"role": "assistant", "content": "<think>We need to find...</think>\n\nConsider the set of residues..."} ] } ``` ## Dataset Creation ### Source Data The dataset is derived from two primary public datasets: 1. [AI-MO/olympiads](https://huggingface.co/datasets/AI-MO/olympiads) - Official solutions from national and international IMO-level mathematical competitions - Competitions: IMO, APMO, USAMO, USAJMO, and other international olympiads 2. [AI-MO/aops](https://huggingface.co/datasets/AI-MO/aops) - Problems from Art of Problem Solving forums - Competitions: Various national and regional competitions ### Multi-Stage Filtering The raw data underwent extensive filtering to ensure high quality: 1. **Structural Filtering**: - Removed problems involving diagrams or images (text-only models) - Discarded trivial problems where answers appear in the statement - Filtered implausibly short or purely computational solutions - Excluded lower-difficulty contests (AMC, routine integral exercises) - Removed all 2025 problems to prevent benchmark contamination 2. **Quality Filtering with GPT-5-Nano**: - Flagged incomplete proofs - Identified inconsistencies between problem statements and solutions - Detected reasoning with major logical gaps 3. **Decontamination**: - Fuzzy string matching to prevent overlap with evaluation benchmarks (IMOProofBench, ProofBench) The result is a curated collection spanning **Algebra, Calculus, Combinatorics, Geometry, Inequalities, Logic and Puzzles, and Number Theory** with substantially higher signal-to-noise ratio than raw scraped corpora. ### Data Collection and Processing 1. **Solution Generation**: - Solutions with chain-of-thought reasoning were generated using **[deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2)** - Used 128k token context limit - Inference orchestrated via **SGLang** across 8 parallel model instances - Each instance distributed over two 8xH100 nodes (TP=8, EP=8, PP=2) - Central router load-balanced requests at **3k tokens/second** aggregate throughput - Solutions formatted with `<think>` tags for internal reasoning 2. **Structural Filtering**: - Retained only structurally valid completions containing: - Closed reasoning blocks (`<think>...</think>`) - Explicit proof sections 3. **Grading with Gemini-3-Pro**: - Solutions graded using continuous **0-7 rubric** based on ProofBench methodology - Rubric specifies: - Major conceptual steps required for complete solution - Common failure modes warranting zero credit - Intermediate checkpoints for partial correctness - **No filtering by grade**: Even incorrect solutions from the teacher model (685B) provide useful learning signal for the student (4B) 4. **Difficulty/Reward Scoring**: - **[Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)** sampled 128 times per problem (reward@128) - Each rollout graded by [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) (medium reasoning effort) - Mean reward scores used to estimate problem difficulty and solution quality - This data enables curriculum learning during RL training 5. **Categorization**: - Problems categorized by mathematical domain using automated classification - Domains: Algebra, Calculus, Combinatorics, Geometry, Inequalities, Logic and Puzzles, Number Theory 6. **Deduplication**: - Exact duplicate problems removed based on problem text for the `default` configuration - Final dataset: **7,777 total samples from 4,300 unique problems** - Available in two configurations: `default` (4,300 unique) and `all` (7,777 with duplicates) ## Intended Use ### Primary Use Cases 1. **Supervised Fine-Tuning (Distillation)**: Training language models to generate mathematical proofs with explicit reasoning by distilling from [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) 2. **Curriculum Learning for RL**: Using reward@128 scores to filter problems by difficulty and learnability 3. **Mathematical Reasoning Research**: Studying proof generation strategies for olympiad-level problems 4. **Reward Model Training**: Using grades and reward scores for preference learning and RL reward signals ### Training Recipe This dataset was used to train **[QED-Nano](https://huggingface.co/lm-provers/QED-Nano)**, a 4B parameter model that achieves competitive performance on IMOProofBench and ProofBench. The training pipeline consisted of: 1. **Stage I: Offline Distillation (SFT)** - Fine-tuned [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) on the dataset - 620 optimization steps, batch size 32 - Cosine LR schedule peaking at 3×10⁻⁵ - Max sequence length: 45,056 tokens - Training on unique problems (4.3k, `default` configuration) outperformed training on all samples (7.7k, `all` configuration) 2. **Stage II: Reinforcement Learning with Reasoning Cache** - GRPO with rubric-based rewards (0-7 continuous scale) - Curriculum learning using reward@128 filtering (mean reward ≤ 0.7, std dev > 0.08) - Training judge: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) (medium reasoning effort) - Multi-turn refinement for long-horizon reasoning - Effective context beyond single-pass token limits ### Supported Tasks - Mathematical proof generation - Chain-of-thought reasoning - Mathematical problem solving - Formal reasoning with LaTeX ### Compatible Formats The `messages` field provides direct compatibility with: - OpenAI chat format - HuggingFace chat templates - Standard instruction-tuning pipelines ## Dataset Statistics ### Overall Size - **Total Samples**: 7,777 (all configuration) / 4,300 (default configuration) - **Unique Problems**: 4,300 - **Duplication Rate**: ~1.81x (some problems have multiple solutions) - **Source Distribution**: - Olympiads: 4,617 samples - AoPS: 3,160 samples ## Limitations and Biases 1. **Language**: Dataset is exclusively in English 2. **Domain**: Focused on Olympiad-level mathematics (may not transfer to other mathematical domains) 3. **Text-Only**: Problems involving diagrams or images were filtered out 4. **Teacher Model Quality**: Solutions generated by [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) may contain errors (intentionally not filtered to preserve learning signal) 5. **Grading Methodology**: - Grades from automated LLM judge (Gemini-3-Pro) may not perfectly align with human expert judgment - Continuous 0-7 rubric provides denser signal than binary correctness 6. **Think Tag Format**: Solutions use specific `<think>` tag format that may need adaptation for different training frameworks 7. **Difficulty Level**: Problems are competition-level and may not be suitable for training on basic mathematics 8. **Benchmark Contamination**: All 2025 problems were explicitly removed to avoid evaluation set overlap 9. **Length Explosion Risk**: SFT models trained on this data may exhibit uncontrolled reasoning length growth without RL regularization ## Citation If you use this dataset in your research, please cite: ```bibtex @dataset{fineproofs_sft_dataset_2026, title={FineProofs SFT Dataset: Mathematical Olympiad Problems with Chain-of-Thought Reasoning}, author={Edward Beeching and Jasper Dekoninck and Jia Li and Yuxiao Qu and Amrith Setlur and Ian Wu and Aviral Kumar and Lewis Tunstall}, year={2026}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/datasets/lm-provers/FineProofs-SFT}} } ``` ## License This dataset is released under the **Apache 2.0** license. You are free to use, modify, and distribute this dataset for both commercial and non-commercial purposes, subject to the terms of the Apache 2.0 license. See the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for full details. ## Acknowledgments - **Project Numina** for the Olympiads and AOPS source datasets, and inspiration from NuminaMath datasets - **Google DeepMind** for Gemini-3-Pro grading capabilities - **DeepSeek** for [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) model used in solution generation - **Qwen Team** for [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) model used in difficulty/reward scoring
提供机构:
lm-provers
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作