five

fuvty/mini-opd-deepmath

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/fuvty/mini-opd-deepmath
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 pretty_name: Mini-OPD DeepMath configs: - config_name: opd data_files: - split: train path: "opd/*.parquet" - config_name: sft data_files: - split: train path: "sft/*.parquet" - config_name: sft_qwen3_4b data_files: - split: train path: "sft_qwen3_4b/*.parquet" - config_name: test data_files: - split: test path: "test/*.parquet" --- # Mini-OPD DeepMath Private dataset package for Stage 0 / Stage 1 work in `mini-opd`. Repo: `fuvty/mini-opd-deepmath` Owner: `fuvty` ## Configs ### `opd` Head of the DeepMath train split reserved for later on-policy distillation experiments. - Split: `train` - Rows: 53,000 - Columns: `question`, `ground_truth`, `source_index` - Source indices: `0..52999` ### `sft` Raw SFT candidate pool from the tail of the DeepMath train split before teacher filtering. - Split: `train` - Rows: 44,870 - Columns: `question`, `ground_truth`, `source_index` - Source indices: `53000..97869` ### `sft_qwen3_4b` Teacher-generated conversations kept from the Stage 0 rollout run with `Qwen/Qwen3-4B`. - Split: `train` - Rows: 35,256 - Columns: `question`, `ground_truth`, `source_index`, `messages` - Source indices: subset of `53000..97869` Important: this config contains only the retained correct conversations from the saved Stage 0 run. Incorrect teacher completions were not persisted in that run and therefore are not included in this release. ### `test` Official DeepMath test split for validation/evaluation. - Split: `test` - Rows: 5,152 - Columns: `question`, `ground_truth`, `source_index` ## Source Dataset - Source: `trl-lib/DeepMath-103K` - Observed train rows in this environment: 97,870 - Observed test rows in this environment: 5,152 ## Stage 0 Teacher Rollout Metadata - Teacher model: `Qwen/Qwen3-4B` - Sampling: `temperature=0.6`, `top_p=0.95`, `max_tokens=14336` - Parallel rollout setting used in the run: `dp=8` - Filter: keep only completions that pass `score_boxed_math_answer` - Saved correct rate: 35,256 / 44,870 = 78.6% ## Notes - `source_index` always refers to the index inside the original source split. - `ground_truth` is the gold DeepMath answer, not the teacher completion. - In `sft_qwen3_4b`, the full saved teacher conversation is stored in `messages` as a JSON string.
提供机构:
fuvty
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作