Mudassir41/ins_re-tuning

Name: Mudassir41/ins_re-tuning
Creator: Mudassir41
Published: 2026-04-05 09:50:24
License: 暂无描述

Hugging Face2026-04-05 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Mudassir41/ins_re-tuning

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: messages list: - name: content dtype: string - name: role dtype: string - name: thinking dtype: string splits: - name: train num_bytes: 179310054.0 num_examples: 19462 download_size: 79547638 dataset_size: 179310054.0 configs: - config_name: default data_files: - split: train path: data/train-* --- --- license: apache-2.0 task_categories: - text-generation language: - en tags: - reasoning - sft - distillation - thinking - code size_categories: - 10K<n<100K --- # Instruction & Reasoning Tuning Dataset A curated mix of ~21K reasoning and coding examples from multiple teacher models, assembled for supervised fine-tuning of small language models (0.8B–4B parameters). Qwen3.5 have really robust reasoning trainging likely from deterministic programatic logic rl evn. but its not ideal for use and needs further rl especially for small models to know not to be exhaustive and fall into loop as most it sees is novel to it. so this phase 2 is to see its reasoning change (starting from final inst tuned + private finetune dataset ~10% ) by these patterns from sota models ## Dataset Composition | Source | Examples | Teacher Model | Type | |---|---|---|---| | Hastagaras/Claude-Sonnet-X-Opus-4.6-Reasoning-small-500 | 524 | Claude 4.6 | Reasoning | | Crownelius/Opus-4.6-Reasoning-3300x | 2,160 | Claude 4.6 | Reasoning | | ykarout/Opus-4.6-reasoning-sft-12k | 4,000 | Claude 4.6 | Reasoning | | Jackrong/Qwen3.5-reasoning-700x | 633 | Qwen 3.5 | Reasoning (same-arch) | | Roman1111111/gemini-3.1-pro-hard-high-reasoning | 300 | Gemini 3.1 | Reasoning | | Roman1111111/gpt-5.4-step-by-step-reasoning | 1,500 | GPT 5.4 | Reasoning | | ianncity/KIMI-K2.5-700000x | 3,000 | KIMI K2.5 | General + Reasoning | | TeichAI/gpt-5.2-high-reasoning-250x | 249 | GPT 5.2 | Reasoning | | TeichAI/Claude-Sonnet-4.6-Reasoning-1100x | 1,096 | Claude 4.6 | Reasoning | | TeichAI/Hunter-Alpha-16k | 3,000 | Hunter Alpha | Coding Agent | | TeichAI/gpt-5.1-codex-max-1000x | 1,000 | GPT 5.1 | Coding | | ianncity/Hunter-Alpha-Programming-160000x | 2,000 | Hunter Alpha | Programming | | REXX-NEW/my-personal-claude-code-data | 549 | Claude Code | Agentic Code | | **Total** | **~21,011** | | | ## Design Decisions - **Multi-teacher diversity**: Examples from Claude, GPT, Gemini, Qwen, KIMI, and Hunter to just try / prevent single-teacher style collapse - **Same-architecture distillation**: Jackrong/Qwen3.5 included for more natural knowledge transfer to Qwen-based target models (probably similar patterns was used for distillation qwen models) - **Reasoning + Coding balance**: ~60% reasoning traces, ~40% coding/agent tasks - **Subsetted large datasets**: Capped at 3K-4K per source ## Format ShareGPT conversational format: ```json { "conversations": [ {"from": "system", "value": "..."}, {"from": "human", "value": "..."}, {"from": "gpt", "value": "<think>\n...\n</think>\n..."} ] } ``` Most assistant responses include `<think>...</think>` reasoning traces followed by the final answer. ## Intended Use Phase 2 SFT training for the TrueINt reasoning model pipeline. Designed to be used after identity/behavioral conditioning (Phase 1) and before reinforcement learning (Phase 3). ## Excluded - Computer-use / vision datasets (reserved for Phase 3) - Broken datasets: Roman1111111/claude-opus-4.6-10000x (broken Arrow schema, unable to load), TeichAI/Claude-Opus-4.6-Reasoning-887x (broken loader) - TeichAI/Claude-Opus-Dataclaw-Unredacted ( reserved for Phase 3 agent training) ## Citation Individual source datasets retain their original licenses. This is a curated assembly for research purposes.

提供机构：

Mudassir41

5,000+

优质数据集

54 个

任务类型

进入经典数据集