96kevinli29/SFT-Dataset
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/96kevinli29/SFT-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: other
tags:
- sft
- supervised-fine-tuning
- math
- reasoning
- code
- science
- parquet
pretty_name: SFT-Dataset
size_categories:
- 10K<n<100K
task_categories:
- text-generation
---
# SFT-Dataset
**Appropriate** quantity, **high** quality, and **a balanced** recipe line up for **supervised fine-tuning of a base model** (for example `Qwen/Qwen3-4B-Base` or `Qwen/Qwen3-8B-Base`). The resulting policy forms a **good foundation for later reinforcement learning**. [`Qwen3-4B-SFT`](https://huggingface.co/96kevinli29/Qwen3-4B-SFT) is trained on this same mixture; the benchmarks on its model card illustrate the outcome.
## Upstream mix (linked; target counts at build time)
| `data_source` | Dataset | Target |
|---------------|---------|--------|
| `openr1_math` | [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) | 15k |
| `numina_cot` | [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 10k |
| `magpie_pro` | [Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) | 15k |
| `codefeedback` | [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) | 5k |
| `scienceqa` | [ScienceQA](https://huggingface.co/datasets/TheMrguiller/ScienceQA) | ~3.4k |
| `science_sft` | In-house GPQA-aligned science slice (not a separate Hub dataset) | ~1.5k |
## Data format
- **Splits (Hub):** ~49k `train.parquet` / ~1k `test.parquet`—confirm on the dataset card.
- **Columns:** `messages`, `data_source`, `category`.
- **Style:** Mixed assistants—many math/science rows use Qwen-style `</think>` … `</think>`; logic/code often plain answers. **Match your base model’s chat template and thinking policy.**
## Links
- **SFT model:** [`96kevinli29/Qwen3-4B-SFT-Math`](https://huggingface.co/96kevinli29/Qwen3-4B-SFT)
- **Training code:** [`96kevinli29/base-model-sft-verl`](https://github.com/96kevinli29/base-model-sft-verl)
- **Base model:** [`Qwen/Qwen3-4B-Base`](https://huggingface.co/Qwen/Qwen3-4B-Base)
## Citation
If you use this mixture, cite this dataset and each upstream source you rely on.
```bibtex
@misc{dataset-sft-math-2025,
title = {SFT-Dataset: Mixed High-Difficulty Corpus for Reasoning SFT},
author = {Hongyang Li, Xiao Li},
year = {2026},
howpublished = {Hugging Face},
url = {https://huggingface.co/datasets/96kevinli29/SFT-Dataset},
note = {Recipe ~50/30/10/10; strict QC; intended pre-RL warm-start (same data as Qwen3-4B-SFT).}
}
```
## License
**Composite.** Hub may show **Other**; comply with **each** upstream.
---
language:
- 英语
license: 其他
tags:
- SFT
- 监督微调(supervised fine-tuning)
- 数学
- 推理
- 代码
- 科学
- Parquet格式数据集
pretty_name: SFT-Dataset
size_categories:
- 10K<n<100K
task_categories:
- 文本生成
---
# SFT-Dataset
**适配的样本规模**、**优质的数据质量**与**均衡的组合配比**,专为基础模型的监督微调(supervised fine-tuning)设计(例如 `Qwen/Qwen3-4B-Base` 或 `Qwen/Qwen3-8B-Base`)。经此数据集微调得到的模型策略,可为后续的强化学习搭建良好的基础。`Qwen3-4B-SFT` 模型即基于此混合数据集训练完成,其模型卡片上的评测结果可直观体现该数据集的训练效果。
## 上游数据源组合(已关联;构建时目标样本量)
| 数据来源 | 数据集 | 目标样本量 |
|---------|---------|---------|
| `openr1_math` | [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) | 15k |
| `numina_cot` | [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 10k |
| `magpie_pro` | [Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) | 15k |
| `codefeedback` | [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) | 5k |
| `scienceqa` | [ScienceQA](https://huggingface.co/datasets/TheMrguiller/ScienceQA) | ~3.4k |
| `science_sft` | 内部构建的与GPQA对齐的科学领域子集(非独立的Hugging Face Hub数据集) | ~1.5k |
## 数据格式
- **数据集划分(Hub端)**:约49k条 `train.parquet` / 约1k条 `test.parquet`,具体请以数据集卡片为准。
- **数据列**:`messages`、`data_source`、`category`。
- **数据风格**:混合了多种助手对话格式——多数数学与科学领域样本采用Qwen风格的 `</think>` … `</think>` 思考标记;逻辑与代码类样本通常为直接作答形式。**请匹配您所用基础模型的对话模板与思考策略**。
## 相关链接
- **SFT模型**:[`96kevinli29/Qwen3-4B-SFT-Math`](https://huggingface.co/96kevinli29/Qwen3-4B-SFT)
- **训练代码**:[`96kevinli29/base-model-sft-verl`](https://github.com/96kevinli29/base-model-sft-verl)
- **基础模型**:[`Qwen/Qwen3-4B-Base`](https://huggingface.co/Qwen/Qwen3-4B-Base)
## 引用说明
若您使用本混合数据集,请同时引用本数据集及您所依赖的各上游数据源。
bibtex
@misc{dataset-sft-math-2025,
title = {SFT-Dataset: 面向推理监督微调的混合高难度语料库},
author = {Hongyang Li, Xiao Li},
year = {2026},
howpublished = {Hugging Face},
url = {https://huggingface.co/datasets/96kevinli29/SFT-Dataset},
note = {配比约为50/30/10/10;采用严格质量控制;旨在用于强化学习前的预热微调(与Qwen3-4B-SFT所用数据一致)}
}
## 许可协议
**复合许可**。Hugging Face Hub可能显示为“其他”,请遵守各上游数据源的许可协议。
提供机构:
96kevinli29



