five

mihailgribov/olympiad_style_integer_math_reasoning

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mihailgribov/olympiad_style_integer_math_reasoning
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 pretty_name: Olympiad Math Reasoning Traces task_categories: - text-generation - question-answering language: - en configs: - config_name: default data_files: - split: train path: reasoning.parquet --- # Olympiad Math Reasoning Traces **Version:** v1.0.2<br> **Release date:** 2026-04-19 **64,763** full model reasoning traces for olympiad-style math problems with verified integer answers. This dataset contains **only correct and non-truncated** traces — every record contains a terminal `\boxed{...}` answer (within the last 500 characters of the response) that matches the expected integer exactly, and none of the responses hit the model's generation-token cap. Intended for distillation and supervised fine-tuning on chain-of-thought math reasoning. Each record is self-contained — problem text, expected answer, and the model's complete response are all included. ## Coverage by Model Every trace in this dataset comes from one of three solver models. The `model` field distinguishes them, so consumers can filter by model (e.g. to train only on compact direct traces, or only on long thinking-style traces). | Model | Records | Style | Median `completion_tokens` | |---|---:|---|---:| | `openai/gpt-oss-120b` | 22,125 | Compact, direct | 1,483 | | `deepseek-ai/DeepSeek-V3.2` | 37,951 | Compact, direct | 1,524 | | `Qwen/Qwen3-Next-80B-A3B-Thinking` | 4,687 | Long thinking-style | 12,302 | > ⚠ **Per-model record counts reflect experimental rollout, not relative model capability.** Each model was run on a different subset of problems depending on cost, rate limits, and rollout phase. Do not read "more records = better model"; use `irt_difficulty` on the task side together with jointly-fit model skill if you need capability comparisons. All three models were queried with `temperature=0` and `max_completion_tokens=32768`. The `solution` field contains the raw provider response — `gpt-oss-120b` and `DeepSeek-V3.2` emit compact analytical prose straight through; `Qwen3-Next-80B-A3B-Thinking` produces long step-by-step reasoning. No reasoning/analysis channels are stripped. ## Loading ```python from datasets import load_dataset ds = load_dataset( "mihailgribov/olympiad_style_integer_math_reasoning", split="train", ) sample = ds[0] print(sample["model"]) print(sample["problem"]) print(sample["solution"]) print(sample["answer"]) # Filter to a specific model oss = ds.filter(lambda x: x["model"] == "openai/gpt-oss-120b") ``` A small inspection sample — 10 random records — is also shipped as `reasoning_sample_10.jsonl` for quick manual browsing without downloading the full parquet. ## Record Schema | Field | Type | Description | |---|---|---| | `id` | `str` | 6-char hex hash of the task graph. Stable across releases. | | `problem` | `str` | Problem text (LaTeX math). | | `answer` | `int` | Expected integer answer in `[0, 99999]`. | | `domain` | `str` | `NT`, `COMB`, `ALG`, or `GEOM`. | | `olympiad_level` | `int` | Problem sophistication level (2–9). | | `root_lemma` | `str \| null` | Root lemma for the task (when available). | | `irt_difficulty` | `object \| null` | IRT-1PL difficulty: `{lo, mid, hi}`. | | `model` | `str` | Source model id. | | `solution` | `str` | Full raw model response (reasoning + final `\boxed{...}`). | | `prompt_tokens` | `int` | Prompt tokens (from the provider's usage report). | | `completion_tokens` | `int` | Completion tokens. | | `source_corpus_version` | `str` | Pinned problem-set version — four-part build id of the source corpus release this record was filtered against. | | `license` | `str` | `"CC BY 4.0"`. | <details> <summary>Example record</summary> ```json { "id": "91c4ce", "problem": "Let $n = 15120$. Let $a = 15$. Let $S$ be the set of all ordered pairs (x, y) of positive integers such that $xy = 2298256$. Define $b$ to be the minimum value of $x + y$ as $(x, y)$ ranges over $S$. Compute the number of positive divisors $d$ of $n$ such that $a \\leq d \\leq b$. Let this number be $r$. Find the value of $68265 - r$.", "answer": 68201, "domain": "NT", "olympiad_level": 4, "root_lemma": "B3", "irt_difficulty": {"lo": 1.49, "mid": 3.56, "hi": 5.38}, "model": "openai/gpt-oss-120b", "solution": "We need to parse problem.\n\nGiven n=15120. a=15. ... [full step-by-step reasoning] ...\n\n\\[\n68265 - r = 68265 - 64 = 68201 .\n\\]\n\n\\[\n\\boxed{68201}\n\\]", "prompt_tokens": 228, "completion_tokens": 3645, "source_corpus_version": "2.0.4.32", "license": "CC BY 4.0" } ``` </details> ## Filtering Criteria Every retained trace satisfies **all** of the following: * **Strict correctness.** The model's boxed answer matches the expected integer answer exactly. * **Non-truncated.** `completion_tokens < 32,000` (below the 32768 generation cap). Responses that hit the cap are dropped because they are likely clipped mid-reasoning. * **Terminal boxed answer.** A final `\boxed{...}` marker is present in the last 500 characters of the response — the model reached a final answer before the response ended. The 500-character window is generous so it includes short trailing explanations after the box. ## Intended Use - **Supervised fine-tuning** on chain-of-thought math reasoning with verified integer answers. - **Multi-teacher distillation** — train a student to match the reasoning style of different teacher models on the same problem set. - **Reasoning-format experiments** — compare SFT on long thinking-style traces (Qwen-Thinking, median ~12 K tokens) versus compact analytical traces (gpt-oss-120b / DeepSeek-V3.2, median ~1.5 K tokens). Because every model solves the *same* underlying problems, the dataset provides a controlled setting for studying which reasoning format transfers best through distillation — a format-level analogue of the "model trajectory shaping" use case described for the source corpus. ## Source of the Problems The problems come from the [`mihailgribov/olympiad_style_integer_math_problems`](https://huggingface.co/datasets/mihailgribov/olympiad_style_integer_math_problems) dataset. This release was built from its **v2.0.4.32** snapshot; the exact four-part build identifier is stored in the `source_corpus_version` field of every record so the join is reproducible across future source-corpus rebuilds. The source corpus provides additional task-level metadata (computation graph, lemma structure, IRT difficulty, per-solver aggregates, etc.) joinable by `id`. ## Citation ```bibtex @misc{gribov2026olympiad-reasoning, author = {Gribov, Mikhail}, title = {Olympiad Math Reasoning Traces}, year = {2026}, version = {v1.0.2}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/datasets/mihailgribov/olympiad_style_integer_math_reasoning}}, license = {CC BY 4.0} } ``` If you use this dataset, please also cite the source corpus: [`olympiad_style_integer_math_problems`](https://huggingface.co/datasets/mihailgribov/olympiad_style_integer_math_problems). ## License Released under **Creative Commons Attribution 4.0 International** ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)). Each record carries `"license": "CC BY 4.0"`.
提供机构:
mihailgribov
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作