fuvty/mini-opd-deepmath

Name: fuvty/mini-opd-deepmath
Creator: fuvty
Published: 2026-04-07 18:03:42
License: 暂无描述

Hugging Face2026-04-07 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/fuvty/mini-opd-deepmath

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 pretty_name: Mini-OPD DeepMath configs: - config_name: opd data_files: - split: train path: "opd/*.parquet" - config_name: sft data_files: - split: train path: "sft/*.parquet" - config_name: sft_qwen3_4b data_files: - split: train path: "sft_qwen3_4b/*.parquet" - config_name: test data_files: - split: test path: "test/*.parquet" --- # Mini-OPD DeepMath Private dataset package for Stage 0 / Stage 1 work in `mini-opd`. Repo: `fuvty/mini-opd-deepmath` Owner: `fuvty` ## Configs ### `opd` Head of the DeepMath train split reserved for later on-policy distillation experiments. - Split: `train` - Rows: 53,000 - Columns: `question`, `ground_truth`, `source_index` - Source indices: `0..52999` ### `sft` Raw SFT candidate pool from the tail of the DeepMath train split before teacher filtering. - Split: `train` - Rows: 44,870 - Columns: `question`, `ground_truth`, `source_index` - Source indices: `53000..97869` ### `sft_qwen3_4b` Teacher-generated conversations kept from the Stage 0 rollout run with `Qwen/Qwen3-4B`. - Split: `train` - Rows: 35,256 - Columns: `question`, `ground_truth`, `source_index`, `messages` - Source indices: subset of `53000..97869` Important: this config contains only the retained correct conversations from the saved Stage 0 run. Incorrect teacher completions were not persisted in that run and therefore are not included in this release. ### `test` Official DeepMath test split for validation/evaluation. - Split: `test` - Rows: 5,152 - Columns: `question`, `ground_truth`, `source_index` ## Source Dataset - Source: `trl-lib/DeepMath-103K` - Observed train rows in this environment: 97,870 - Observed test rows in this environment: 5,152 ## Stage 0 Teacher Rollout Metadata - Teacher model: `Qwen/Qwen3-4B` - Sampling: `temperature=0.6`, `top_p=0.95`, `max_tokens=14336` - Parallel rollout setting used in the run: `dp=8` - Filter: keep only completions that pass `score_boxed_math_answer` - Saved correct rate: 35,256 / 44,870 = 78.6% ## Notes - `source_index` always refers to the index inside the original source split. - `ground_truth` is the gold DeepMath answer, not the teacher completion. - In `sft_qwen3_4b`, the full saved teacher conversation is stored in `messages` as a JSON string.

提供机构：

fuvty

5,000+

优质数据集

54 个

任务类型

进入经典数据集