five

annakosovskaia/NuminaMath-1.5-RL-Verifiable-cleaned

收藏
Hugging Face2026-05-18 更新2026-05-31 收录
下载链接:
https://hf-mirror.com/datasets/annakosovskaia/NuminaMath-1.5-RL-Verifiable-cleaned
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: all data_files: - split: train path: all/train-* - config_name: clean data_files: - split: train path: clean/train-* dataset_info: - config_name: all features: - name: problem dtype: large_string - name: solution dtype: large_string - name: problem_type dtype: large_string - name: question_type dtype: large_string - name: problem_is_valid dtype: large_string - name: solution_is_valid dtype: large_string - name: source dtype: large_string - name: synthetic dtype: bool - name: is_verifiable_final_answer_task dtype: bool - name: is_coherent_solution dtype: bool - name: is_complete dtype: bool - name: has_final_answer dtype: bool - name: confidence dtype: large_string - name: validation_raw dtype: large_string - name: answer dtype: large_string - name: problem_id dtype: int64 splits: - name: train num_bytes: 161718940 num_examples: 100050 download_size: 63184810 dataset_size: 161718940 - config_name: clean features: - name: problem dtype: large_string - name: solution dtype: large_string - name: problem_type dtype: large_string - name: question_type dtype: large_string - name: problem_is_valid dtype: large_string - name: solution_is_valid dtype: large_string - name: source dtype: large_string - name: synthetic dtype: bool - name: is_verifiable_final_answer_task dtype: bool - name: is_coherent_solution dtype: bool - name: is_complete dtype: bool - name: has_final_answer dtype: bool - name: confidence dtype: large_string - name: validation_raw dtype: large_string - name: answer dtype: large_string - name: problem_id dtype: int64 splits: - name: train num_bytes: 132728968 num_examples: 81147 download_size: 51808498 dataset_size: 132728968 --- # NuminaMath-1.5-RL-Verifiable (cleaned) A cleaned and validated subset of [`nlile/NuminaMath-1.5-RL-Verifiable`](https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable). LLM-validated and re-extracted final answers for use in math RL / SFT pipelines. ## What changed Starting from `nlile/NuminaMath-1.5-RL-Verifiable` (131,063 rows), we applied: ### 1. Regex/structural cleanup (`prepare_numina.py`) **Stripped problem-number prefixes:** - `Problem 3.`, `Problem A2.`, `Problem N1.`, `Problem 15` - `Task 1.`, `Task A-1.1.`, `Task Condition` - `## Problem ...`, `## Task ...`, `## Aufgabe 1`, `## Zadatak B-1.1.`, `## Subject I`, `## Exercise 5`, `## Condition of the problem` - `# 15.`, `# 6.1. Condition:`, `# Task 4. (10 points)` - `A3.`, `B2.`, `NT1`, `NT12`, `1A.`, `2B.` - `A 1.`, `NT 3.` (letter+space+digit) - `96.2.`, `03.4.` (multi-level numbers) - `XXXVIII OM - II - Zadanie 4`, `LIV OM - II - Task 3`, `L OM - I - Problem 8` (competition headers) - `[4 points]`, `(7 points)`, `II. (5 points)` - `(Option 1)`, `[u]Round 5[/u]` - Topic tags: `[ Arithmetic. Mental calculation, etc.]`, `[ Decimal numeral system ]` - 3-letter country codes followed by newline: `MLD`, `ALB`, `SAU`, `EST-` **Stripped solution prefixes:** - `Solution.`, `Solution 1.`, `Solution:`, `SOLUTION.` - `# Solution.`, `## Solution`, `## Solution 1:`, `1. Solution.`, `2. Solution.` - `[Solution]`, `【Solution】` (bracketed markers) - `Answer: ...` / `Answer N.` prefix at start (to avoid teaching answer-first generation) - `22. Answer: 13\nSolution. ...` (problem-number + leaked answer + Solution chain) **Stripped solution inline / trailing markers:** - `Detailed Explanation:`, `Detailed Solution:` - Repeated translation artifacts (`Certainly, here is the translation: ---`) - Trailing citation tails (e.g. `Kuznetsov Differentiation Problem 17-10`) - Trailing `Answer: ...` line at end of solution (redundant — answer is in `answer` column) - Trailing grading rubrics (`Evaluation Criteria:`, `Award N points`) **Dropped rows** where the solution was less than 30 chars after cleanup (these were "answer-only solutions" with no reasoning content). **Dropped rows with missing-image references:** `[asy]`, `\includegraphics`, `Fig.`, `Figure N`, `as shown in the figure`, `see diagram`, `in the diagram above`, image filenames (`*.jpg`, `*.png`, ...), xy-pic diagram commands (`\spos`, `\xymatrix`). **Dropped rows** where the `problem` field actually contained a solution/answer (e.g. starts with `Solution.` or `Answer:`). **Dropped multi-part problems** (`a) ... b) ...`, `1) ... 2) ...`) — answer extraction is unreliable for these. ### 2. LLM-based quality validation (Qwen3-32B) Each row was scored on: - `is_verifiable_final_answer_task` — not a "prove that..." task - `is_coherent_solution` — internally consistent - `is_complete` — no truncation, no dangling references, no missing announced steps - `has_final_answer` — explicit final answer stated - `confidence` — `high` / `medium` / `low` <details> <summary>Validation prompt</summary> ``` You are evaluating a math competition solution for dataset quality. You will be given a PROBLEM and a SOLUTION. Evaluate the solution and return a JSON object. Return ONLY a JSON object with these fields: { "is_verifiable_final_answer_task": <bool>, "is_coherent_solution": <bool>, "is_complete": <bool>, "has_final_answer": <bool>, "confidence": "high" | "medium" | "low" } Field definitions: - is_verifiable_final_answer_task: the problem asks for a specific answer (value, set, expression, count) that can be verified — NOT a pure "prove that..." or "show that..." task - is_coherent_solution: the solution steps follow logically from the problem; internally consistent and not self-contradictory - is_complete: no truncation mid-sentence or mid-equation; no announced steps/equations/expressions that are then absent ("we get:", "substituting:", followed by nothing); no references to figures or diagrams not present in the text; no dangling references like "from equation (2)" where equation (2) was never shown; argument reaches a conclusion - has_final_answer: the solution explicitly states a final answer (boxed answer, "the answer is X", "therefore X = ...") — false if it ends without a clearly stated result - confidence: high = easy to assess, criteria are clear-cut; medium = one or two criteria are borderline; low = hard to assess overall No explanation, no markdown, just the JSON object. ``` </details> ### 3. Multi-form answer re-extraction (Qwen3-32B → Qwen3-235B) The original `answer` field was often incorrectly extracted, so we re-did it from scratch. The final answer was extracted directly from the solution in 1-3 semantically equivalent forms (e.g. `["x \in \{1, 3\}", "\{1, 3\}"]`). First pass by Qwen3-32B, then verified and corrected by Qwen3-235B-A22B-Instruct-2507. <details> <summary>Extraction prompt (Qwen3-32B, first pass)</summary> ``` Return the final answer of this math solution. Provide 1-3 SEMANTICALLY different forms (skip trivial syntactic rewrites — those are normalized downstream). Useful variants: • multiple choice: "a)" and "True" • parametric: "x = k\pi, k \in \mathbb{Z}" and "\{k\pi : k \in \mathbb{Z}\}" • pair vs two values: "(2, 3)" and "2, 3" • binomial: "\binom{n}{k}" and "C_n^k" and "C(n,k)" Do NOT add variants for: \frac vs \dfrac, \sqrt{3} vs \sqrt 3, 0.5 vs 1/2, \left( vs (, \text{abc} vs abc, x=5 vs 5, \{1,3\} vs 1,3, \{(2,3)\} vs (2,3) — these are auto-normalized. Rules: preserve LaTeX as in the solution. No "Answer:", no \boxed{}, no explanations. One form per line. If no answer: NONE PROBLEM: {problem} SOLUTION: {solution} ``` </details> <details> <summary>Verification + correction prompt (Qwen3-235B, second pass)</summary> ``` You are correcting extracted answer variants for a math problem. The variants below were produced by a weaker model and may contain mistakes. Look at PROBLEM and SOLUTION, decide what the true final answer is, then output a clean list of variants — at most 3 SEMANTICALLY EQUIVALENT forms of that single true answer. Strict rules: 1. If a list/set is a SINGLE answer (e.g. "n can be 12, 14, 18, 22, or 32"), output it as ONE variant: "12, 14, 18, 22, 32" or "\{12, 14, 18, 22, 32\}". Do NOT split list members across multiple variant lines. 2. DROP every variant that is: - prose / explanation in words (e.g. "There are 33 even numbers", "Total of 27 solutions") - a wrong / different numerical value not supported by the solution - an alternative hypothesis the model wasn't sure about - a sub-result, intermediate step, or unrelated object (e.g. listing example values when the answer is a count) - a trivial syntactic rewrite (\frac vs \dfrac, 0.5 vs 1/2, \{1,3\} vs 1,3 — auto-normalized) 3. KEEP variants that are SEMANTICALLY DIFFERENT FORMS of the same answer: - "(2, 3)" and "2, 3" (pair vs values) - "x = k\pi, k \in \mathbb{Z}" and "\{k\pi : k \in \mathbb{Z}\}" (parametric) - "a)" and "True" and "1" (multiple choice) - "\binom{n}{k}" and "C_n^k" (alternate notation) Output one variant per line. Preserve LaTeX. No explanations, no labels, no "Answer:". If no valid variants: NONE PROBLEM: {problem} SOLUTION: {solution} CURRENT VARIANTS: {variants} ``` </details> ## Schema | Column | Type | Description | |---|---|---| | `problem` | str | Cleaned problem statement | | `solution` | str | Cleaned solution / chain of thought | | `answer` | str (JSON list) | 1-3 equivalent answer forms; `None` if not extractable | | `problem_type`, `question_type`, `source`, `synthetic` | inherited from source | | | `is_verifiable_final_answer_task` | bool | | | `is_coherent_solution` | bool | | | `is_complete` | bool | | | `has_final_answer` | bool | | | `confidence` | str | `high` / `medium` / `low` | | `validation_raw` | str | Raw LLM output from the validation step (debugging) | ## Stats - **Total rows:** 100,050 (down from 131,063) - **With canonical `answer`:** 96,049 (96.0%) ## Usage ```python from datasets import load_dataset import json ds = load_dataset("annakosovskaia/NuminaMath-1.5-RL-Verifiable-cleaned", split="train") row = ds[0] variants = json.loads(row["answer"]) # list of equivalent answer forms ``` For RL/SFT, filter to high-quality rows: ```python ds = ds.filter(lambda x: x["is_verifiable_final_answer_task"] and x["is_coherent_solution"] and x["is_complete"] and x["has_final_answer"] and x["confidence"] in {"high", "medium"} and x["answer"] is not None ) ``` ## License Same as the source dataset.
提供机构:
annakosovskaia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作