mihailgribov/olympiad_style_integer_math_reasoning
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mihailgribov/olympiad_style_integer_math_reasoning
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
pretty_name: Olympiad Math Reasoning Traces
task_categories:
- text-generation
- question-answering
language:
- en
configs:
- config_name: default
data_files:
- split: train
path: reasoning.parquet
---
# Olympiad Math Reasoning Traces
**Version:** v1.0.2<br>
**Release date:** 2026-04-19
**64,763** full model reasoning traces for olympiad-style math problems with verified integer answers. This dataset contains **only correct and non-truncated** traces — every record contains a terminal `\boxed{...}` answer (within the last 500 characters of the response) that matches the expected integer exactly, and none of the responses hit the model's generation-token cap. Intended for distillation and supervised fine-tuning on chain-of-thought math reasoning. Each record is self-contained — problem text, expected answer, and the model's complete response are all included.
## Coverage by Model
Every trace in this dataset comes from one of three solver models. The `model` field distinguishes them, so consumers can filter by model (e.g. to train only on compact direct traces, or only on long thinking-style traces).
| Model | Records | Style | Median `completion_tokens` |
|---|---:|---|---:|
| `openai/gpt-oss-120b` | 22,125 | Compact, direct | 1,483 |
| `deepseek-ai/DeepSeek-V3.2` | 37,951 | Compact, direct | 1,524 |
| `Qwen/Qwen3-Next-80B-A3B-Thinking` | 4,687 | Long thinking-style | 12,302 |
> ⚠ **Per-model record counts reflect experimental rollout, not relative model capability.** Each model was run on a different subset of problems depending on cost, rate limits, and rollout phase. Do not read "more records = better model"; use `irt_difficulty` on the task side together with jointly-fit model skill if you need capability comparisons.
All three models were queried with `temperature=0` and `max_completion_tokens=32768`. The `solution` field contains the raw provider response — `gpt-oss-120b` and `DeepSeek-V3.2` emit compact analytical prose straight through; `Qwen3-Next-80B-A3B-Thinking` produces long step-by-step reasoning. No reasoning/analysis channels are stripped.
## Loading
```python
from datasets import load_dataset
ds = load_dataset(
"mihailgribov/olympiad_style_integer_math_reasoning",
split="train",
)
sample = ds[0]
print(sample["model"])
print(sample["problem"])
print(sample["solution"])
print(sample["answer"])
# Filter to a specific model
oss = ds.filter(lambda x: x["model"] == "openai/gpt-oss-120b")
```
A small inspection sample — 10 random records — is also shipped as `reasoning_sample_10.jsonl` for quick manual browsing without downloading the full parquet.
## Record Schema
| Field | Type | Description |
|---|---|---|
| `id` | `str` | 6-char hex hash of the task graph. Stable across releases. |
| `problem` | `str` | Problem text (LaTeX math). |
| `answer` | `int` | Expected integer answer in `[0, 99999]`. |
| `domain` | `str` | `NT`, `COMB`, `ALG`, or `GEOM`. |
| `olympiad_level` | `int` | Problem sophistication level (2–9). |
| `root_lemma` | `str \| null` | Root lemma for the task (when available). |
| `irt_difficulty` | `object \| null` | IRT-1PL difficulty: `{lo, mid, hi}`. |
| `model` | `str` | Source model id. |
| `solution` | `str` | Full raw model response (reasoning + final `\boxed{...}`). |
| `prompt_tokens` | `int` | Prompt tokens (from the provider's usage report). |
| `completion_tokens` | `int` | Completion tokens. |
| `source_corpus_version` | `str` | Pinned problem-set version — four-part build id of the source corpus release this record was filtered against. |
| `license` | `str` | `"CC BY 4.0"`. |
<details>
<summary>Example record</summary>
```json
{
"id": "91c4ce",
"problem": "Let $n = 15120$. Let $a = 15$. Let $S$ be the set of all ordered pairs (x, y) of positive integers such that $xy = 2298256$. Define $b$ to be the minimum value of $x + y$ as $(x, y)$ ranges over $S$. Compute the number of positive divisors $d$ of $n$ such that $a \\leq d \\leq b$. Let this number be $r$. Find the value of $68265 - r$.",
"answer": 68201,
"domain": "NT",
"olympiad_level": 4,
"root_lemma": "B3",
"irt_difficulty": {"lo": 1.49, "mid": 3.56, "hi": 5.38},
"model": "openai/gpt-oss-120b",
"solution": "We need to parse problem.\n\nGiven n=15120. a=15. ... [full step-by-step reasoning] ...\n\n\\[\n68265 - r = 68265 - 64 = 68201 .\n\\]\n\n\\[\n\\boxed{68201}\n\\]",
"prompt_tokens": 228,
"completion_tokens": 3645,
"source_corpus_version": "2.0.4.32",
"license": "CC BY 4.0"
}
```
</details>
## Filtering Criteria
Every retained trace satisfies **all** of the following:
* **Strict correctness.** The model's boxed answer matches the expected integer answer exactly.
* **Non-truncated.** `completion_tokens < 32,000` (below the 32768 generation cap). Responses that hit the cap are dropped because they are likely clipped mid-reasoning.
* **Terminal boxed answer.** A final `\boxed{...}` marker is present in the last 500 characters of the response — the model reached a final answer before the response ended. The 500-character window is generous so it includes short trailing explanations after the box.
## Intended Use
- **Supervised fine-tuning** on chain-of-thought math reasoning with verified integer answers.
- **Multi-teacher distillation** — train a student to match the reasoning style of different teacher models on the same problem set.
- **Reasoning-format experiments** — compare SFT on long thinking-style traces (Qwen-Thinking, median ~12 K tokens) versus compact analytical traces (gpt-oss-120b / DeepSeek-V3.2, median ~1.5 K tokens). Because every model solves the *same* underlying problems, the dataset provides a controlled setting for studying which reasoning format transfers best through distillation — a format-level analogue of the "model trajectory shaping" use case described for the source corpus.
## Source of the Problems
The problems come from the [`mihailgribov/olympiad_style_integer_math_problems`](https://huggingface.co/datasets/mihailgribov/olympiad_style_integer_math_problems) dataset. This release was built from its **v2.0.4.32** snapshot; the exact four-part build identifier is stored in the `source_corpus_version` field of every record so the join is reproducible across future source-corpus rebuilds. The source corpus provides additional task-level metadata (computation graph, lemma structure, IRT difficulty, per-solver aggregates, etc.) joinable by `id`.
## Citation
```bibtex
@misc{gribov2026olympiad-reasoning,
author = {Gribov, Mikhail},
title = {Olympiad Math Reasoning Traces},
year = {2026},
version = {v1.0.2},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/mihailgribov/olympiad_style_integer_math_reasoning}},
license = {CC BY 4.0}
}
```
If you use this dataset, please also cite the source corpus: [`olympiad_style_integer_math_problems`](https://huggingface.co/datasets/mihailgribov/olympiad_style_integer_math_problems).
## License
Released under **Creative Commons Attribution 4.0 International** ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)).
Each record carries `"license": "CC BY 4.0"`.
提供机构:
mihailgribov



