smirki/combined-sft-dataset

Name: smirki/combined-sft-dataset
Creator: smirki
Published: 2026-02-25 10:21:28
License: 暂无描述

Hugging Face2026-02-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/smirki/combined-sft-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - sft - reasoning - combined - llada2 pretty_name: Combined SFT Dataset for LLaDA2 --- # Combined SFT Dataset Unified dataset combining multiple sources for LLaDA2 SFT training. ## Format JSONL with `messages` (array of `{role, content}` objects) and `source` (string) per row. The last message in every row has `role: "assistant"`. ## Sources | Source | Description | |--------|-------------| | `opus-4.6-reasoning-3000x` | Opus 4.6 reasoning (filtered) | | `claude-4.5-opus-reasoning-250x` | Claude 4.5 Opus high reasoning | | `openresearcher` | OpenResearcher research QA | | `toolmind-web-qa` | ToolMind Web QA with tool use | | `pony-alpha-15k` | Pony Alpha 15k | | `gemini-3-pro-reasoning-250x` | Gemini 3 Pro reasoning | | `claude-code-personal` | Personal Claude Code conversations | | `fineproofs-sft` | FineProofs math proofs | | `real-slop` | Real-Slop reasoning traces | | `stem-reasoning-complex` | STEM reasoning complex | | `terminal-bench-2` | Terminal-Bench 2.0 successful trials | ## Usage ```python from datasets import load_dataset ds = load_dataset("smirki/combined-sft-dataset", split="train") # Filter by source opus_data = ds.filter(lambda x: x["source"] == "opus-4.6-reasoning-3000x") ```

提供机构：

smirki

5,000+

优质数据集

54 个

任务类型

进入经典数据集