five

96kevinli29/SFT-Dataset

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/96kevinli29/SFT-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: other tags: - sft - supervised-fine-tuning - math - reasoning - code - science - parquet pretty_name: SFT-Dataset size_categories: - 10K<n<100K task_categories: - text-generation --- # SFT-Dataset **Appropriate** quantity, **high** quality, and **a balanced** recipe line up for **supervised fine-tuning of a base model** (for example `Qwen/Qwen3-4B-Base` or `Qwen/Qwen3-8B-Base`). The resulting policy forms a **good foundation for later reinforcement learning**. [`Qwen3-4B-SFT`](https://huggingface.co/96kevinli29/Qwen3-4B-SFT) is trained on this same mixture; the benchmarks on its model card illustrate the outcome. ## Upstream mix (linked; target counts at build time) | `data_source` | Dataset | Target | |---------------|---------|--------| | `openr1_math` | [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) | 15k | | `numina_cot` | [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 10k | | `magpie_pro` | [Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) | 15k | | `codefeedback` | [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) | 5k | | `scienceqa` | [ScienceQA](https://huggingface.co/datasets/TheMrguiller/ScienceQA) | ~3.4k | | `science_sft` | In-house GPQA-aligned science slice (not a separate Hub dataset) | ~1.5k | ## Data format - **Splits (Hub):** ~49k `train.parquet` / ~1k `test.parquet`—confirm on the dataset card. - **Columns:** `messages`, `data_source`, `category`. - **Style:** Mixed assistants—many math/science rows use Qwen-style `</think>` … `</think>`; logic/code often plain answers. **Match your base model’s chat template and thinking policy.** ## Links - **SFT model:** [`96kevinli29/Qwen3-4B-SFT-Math`](https://huggingface.co/96kevinli29/Qwen3-4B-SFT) - **Training code:** [`96kevinli29/base-model-sft-verl`](https://github.com/96kevinli29/base-model-sft-verl) - **Base model:** [`Qwen/Qwen3-4B-Base`](https://huggingface.co/Qwen/Qwen3-4B-Base) ## Citation If you use this mixture, cite this dataset and each upstream source you rely on. ```bibtex @misc{dataset-sft-math-2025, title = {SFT-Dataset: Mixed High-Difficulty Corpus for Reasoning SFT}, author = {Hongyang Li, Xiao Li}, year = {2026}, howpublished = {Hugging Face}, url = {https://huggingface.co/datasets/96kevinli29/SFT-Dataset}, note = {Recipe ~50/30/10/10; strict QC; intended pre-RL warm-start (same data as Qwen3-4B-SFT).} } ``` ## License **Composite.** Hub may show **Other**; comply with **each** upstream.

--- language: - 英语 license: 其他 tags: - SFT - 监督微调(supervised fine-tuning) - 数学 - 推理 - 代码 - 科学 - Parquet格式数据集 pretty_name: SFT-Dataset size_categories: - 10K<n<100K task_categories: - 文本生成 --- # SFT-Dataset **适配的样本规模**、**优质的数据质量**与**均衡的组合配比**,专为基础模型的监督微调(supervised fine-tuning)设计(例如 `Qwen/Qwen3-4B-Base` 或 `Qwen/Qwen3-8B-Base`)。经此数据集微调得到的模型策略,可为后续的强化学习搭建良好的基础。`Qwen3-4B-SFT` 模型即基于此混合数据集训练完成,其模型卡片上的评测结果可直观体现该数据集的训练效果。 ## 上游数据源组合(已关联;构建时目标样本量) | 数据来源 | 数据集 | 目标样本量 | |---------|---------|---------| | `openr1_math` | [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) | 15k | | `numina_cot` | [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | 10k | | `magpie_pro` | [Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) | 15k | | `codefeedback` | [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) | 5k | | `scienceqa` | [ScienceQA](https://huggingface.co/datasets/TheMrguiller/ScienceQA) | ~3.4k | | `science_sft` | 内部构建的与GPQA对齐的科学领域子集(非独立的Hugging Face Hub数据集) | ~1.5k | ## 数据格式 - **数据集划分(Hub端)**:约49k条 `train.parquet` / 约1k条 `test.parquet`,具体请以数据集卡片为准。 - **数据列**:`messages`、`data_source`、`category`。 - **数据风格**:混合了多种助手对话格式——多数数学与科学领域样本采用Qwen风格的 `</think>` … `</think>` 思考标记;逻辑与代码类样本通常为直接作答形式。**请匹配您所用基础模型的对话模板与思考策略**。 ## 相关链接 - **SFT模型**:[`96kevinli29/Qwen3-4B-SFT-Math`](https://huggingface.co/96kevinli29/Qwen3-4B-SFT) - **训练代码**:[`96kevinli29/base-model-sft-verl`](https://github.com/96kevinli29/base-model-sft-verl) - **基础模型**:[`Qwen/Qwen3-4B-Base`](https://huggingface.co/Qwen/Qwen3-4B-Base) ## 引用说明 若您使用本混合数据集,请同时引用本数据集及您所依赖的各上游数据源。 bibtex @misc{dataset-sft-math-2025, title = {SFT-Dataset: 面向推理监督微调的混合高难度语料库}, author = {Hongyang Li, Xiao Li}, year = {2026}, howpublished = {Hugging Face}, url = {https://huggingface.co/datasets/96kevinli29/SFT-Dataset}, note = {配比约为50/30/10/10;采用严格质量控制;旨在用于强化学习前的预热微调(与Qwen3-4B-SFT所用数据一致)} } ## 许可协议 **复合许可**。Hugging Face Hub可能显示为“其他”,请遵守各上游数据源的许可协议。
提供机构:
96kevinli29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作