five

smirki/combined-sft-dataset

收藏
Hugging Face2026-02-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/smirki/combined-sft-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - sft - reasoning - combined - llada2 pretty_name: Combined SFT Dataset for LLaDA2 --- # Combined SFT Dataset Unified dataset combining multiple sources for LLaDA2 SFT training. ## Format JSONL with `messages` (array of `{role, content}` objects) and `source` (string) per row. The last message in every row has `role: "assistant"`. ## Sources | Source | Description | |--------|-------------| | `opus-4.6-reasoning-3000x` | Opus 4.6 reasoning (filtered) | | `claude-4.5-opus-reasoning-250x` | Claude 4.5 Opus high reasoning | | `openresearcher` | OpenResearcher research QA | | `toolmind-web-qa` | ToolMind Web QA with tool use | | `pony-alpha-15k` | Pony Alpha 15k | | `gemini-3-pro-reasoning-250x` | Gemini 3 Pro reasoning | | `claude-code-personal` | Personal Claude Code conversations | | `fineproofs-sft` | FineProofs math proofs | | `real-slop` | Real-Slop reasoning traces | | `stem-reasoning-complex` | STEM reasoning complex | | `terminal-bench-2` | Terminal-Bench 2.0 successful trials | ## Usage ```python from datasets import load_dataset ds = load_dataset("smirki/combined-sft-dataset", split="train") # Filter by source opus_data = ds.filter(lambda x: x["source"] == "opus-4.6-reasoning-3000x") ```
提供机构:
smirki
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作