Mudassir41/ins_re-tuning
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Mudassir41/ins_re-tuning
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: thinking
dtype: string
splits:
- name: train
num_bytes: 179310054.0
num_examples: 19462
download_size: 79547638
dataset_size: 179310054.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- reasoning
- sft
- distillation
- thinking
- code
size_categories:
- 10K<n<100K
---
# Instruction & Reasoning Tuning Dataset
A curated mix of ~21K reasoning and coding examples from multiple teacher models, assembled for supervised fine-tuning of small language models (0.8B–4B parameters).
Qwen3.5 have really robust reasoning trainging likely from deterministic programatic logic rl evn.
but its not ideal for use and needs further rl especially for small models to know not to be exhaustive and fall into loop as most it sees is novel to it.
so this phase 2 is to see its reasoning change (starting from final inst tuned + private finetune dataset ~10% ) by these patterns from sota models
## Dataset Composition
| Source | Examples | Teacher Model | Type |
|---|---|---|---|
| Hastagaras/Claude-Sonnet-X-Opus-4.6-Reasoning-small-500 | 524 | Claude 4.6 | Reasoning |
| Crownelius/Opus-4.6-Reasoning-3300x | 2,160 | Claude 4.6 | Reasoning |
| ykarout/Opus-4.6-reasoning-sft-12k | 4,000 | Claude 4.6 | Reasoning |
| Jackrong/Qwen3.5-reasoning-700x | 633 | Qwen 3.5 | Reasoning (same-arch) |
| Roman1111111/gemini-3.1-pro-hard-high-reasoning | 300 | Gemini 3.1 | Reasoning |
| Roman1111111/gpt-5.4-step-by-step-reasoning | 1,500 | GPT 5.4 | Reasoning |
| ianncity/KIMI-K2.5-700000x | 3,000 | KIMI K2.5 | General + Reasoning |
| TeichAI/gpt-5.2-high-reasoning-250x | 249 | GPT 5.2 | Reasoning |
| TeichAI/Claude-Sonnet-4.6-Reasoning-1100x | 1,096 | Claude 4.6 | Reasoning |
| TeichAI/Hunter-Alpha-16k | 3,000 | Hunter Alpha | Coding Agent |
| TeichAI/gpt-5.1-codex-max-1000x | 1,000 | GPT 5.1 | Coding |
| ianncity/Hunter-Alpha-Programming-160000x | 2,000 | Hunter Alpha | Programming |
| REXX-NEW/my-personal-claude-code-data | 549 | Claude Code | Agentic Code |
| **Total** | **~21,011** | | |
## Design Decisions
- **Multi-teacher diversity**: Examples from Claude, GPT, Gemini, Qwen, KIMI, and Hunter to just try / prevent single-teacher style collapse
- **Same-architecture distillation**: Jackrong/Qwen3.5 included for more natural knowledge transfer to Qwen-based target models (probably similar patterns was used for distillation qwen models)
- **Reasoning + Coding balance**: ~60% reasoning traces, ~40% coding/agent tasks
- **Subsetted large datasets**: Capped at 3K-4K per source
## Format
ShareGPT conversational format:
```json
{
"conversations": [
{"from": "system", "value": "..."},
{"from": "human", "value": "..."},
{"from": "gpt", "value": "<think>\n...\n</think>\n..."}
]
}
```
Most assistant responses include `<think>...</think>` reasoning traces followed by the final answer.
## Intended Use
Phase 2 SFT training for the TrueINt reasoning model pipeline. Designed to be used after identity/behavioral conditioning (Phase 1) and before reinforcement learning (Phase 3).
## Excluded
- Computer-use / vision datasets (reserved for Phase 3)
- Broken datasets: Roman1111111/claude-opus-4.6-10000x (broken Arrow schema, unable to load), TeichAI/Claude-Opus-4.6-Reasoning-887x (broken loader)
- TeichAI/Claude-Opus-Dataclaw-Unredacted ( reserved for Phase 3 agent training)
## Citation
Individual source datasets retain their original licenses. This is a curated assembly for research purposes.
提供机构:
Mudassir41



