ykarout/Opus-4.6-reasoning-sft-12k

Name: ykarout/Opus-4.6-reasoning-sft-12k
Creator: ykarout
Published: 2026-04-01 12:34:33
License: 暂无描述

Hugging Face2026-04-01 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/ykarout/Opus-4.6-reasoning-sft-12k

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Opus-4.6-reasoning-sft-12k language: - en license: other tags: - reasoning - chain-of-thought - sft - conversational - qwen - synthetic - math - unsloth - opus - claude - openclaw - qwen3.5 task_categories: - text-generation - question-answering - reinforcement-learning size_categories: - 10K<n<100K --- # Opus-4.6-reasoning-sft-12k A unified conversational reasoning dataset built by combining and normalizing two Hugging Face datasets into a single training-ready schema for supervised fine-tuning. ## Overview This dataset was created to support reasoning-focused SFT for chat models, especially Qwen-family conversational models and derivatives. It unifies two source datasets into one consistent format: - `Roman1111111/claude-opus-4.6-10000x` - `Crownelius/Opus-4.6-Reasoning-3300x` The final dataset is structured around a canonical `messages` field so it can be used directly in modern TRL / Unsloth SFT workflows. ## Why this dataset exists The source datasets were useful but not fully aligned out of the box: - one dataset used raw reasoning fields like `problem`, `thinking`, and `solution` - the other used chat-style `messages`, but stored reasoning separately inside the assistant message - one also included a repeated generic system prompt that was not useful to keep for every example This dataset solves that by normalizing everything into one consistent conversational format with explicit reasoning preserved in the assistant reply. ## Source datasets ### 1) Roman1111111/claude-opus-4.6-10000x Used as a conversational reasoning source. Original structure included: - `messages` - `metadata` Typical rows contained: - a generic `system` message - a `user` message - an `assistant` message with: - `content` - `reasoning` ### 2) Crownelius/Opus-4.6-Reasoning-3300x Used as a reasoning distillation source. Original structure included fields such as: - `problem` - `thinking` - `solution` - `difficulty` - `category` - `id` ## Processing and enhancements The following normalization and enhancement steps were applied. ### Crownelius normalization Rows were converted from raw fields into chat conversations: - `problem` -> user message - `thinking` -> assistant reasoning - `solution` -> assistant final answer The assistant response was rewritten into the format: ```text <think> ... </think> final answer ``` ### Roman normalization Rows were converted from the original chat-like format into the same canonical structure. Enhancements: - removed the repeated generic system prompt: - `You are a helpful AI assistant.` - preserved the user message - rebuilt the assistant message from: - `assistant.reasoning` - `assistant.content` The assistant response was normalized into: ```text <think> ... </think> final answer ``` ### Unified schema Both datasets were normalized into the same final structure: ```python { "messages": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "<think>\n...\n</think>\n\nfinal answer"} ], "source": "...", "difficulty": "...", "category": "...", "example_id": "...", "n_tokens": 123 } ``` ## Final dataset columns - `messages` Canonical conversational training format. - `source` Origin of the sample: - `roman` - `crownelius` - `difficulty` Difficulty label from upstream data when available. - `category` Category label from upstream data when available. - `example_id` Original id when available. - `n_tokens` Token count measured after rendering the conversation with the target tokenizer chat template. ## Token length profile Token lengths were measured after rendering the normalized conversations with the target tokenizer chat template. Combined dataset statistics: - **count:** 11,791 - **p50:** 255 - **p90:** 922 - **p95:** 1141 - **p99:** 1805 - **max:** 7569 This makes the dataset very practical for training with an `8192` context window. ## Applicability This dataset is well suited for: - supervised fine-tuning of reasoning-capable chat models - preserving explicit reasoning traces during SFT - training models to answer with both intermediate reasoning and a final answer - math, logic, QA, short analytical tasks, and structured problem solving - Qwen-family chat models and compatible conversational SFT pipelines It is especially useful when you want a `messages`-based dataset that can be fed directly into: - TRL `SFTTrainer` - Unsloth conversational SFT workflows - chat-template-aware training pipelines ## Training format recommendation Use the `messages` column as the canonical source format. Recommended approach: 1. load the dataset 2. let the model tokenizer apply its own chat template 3. train on assistant messages only if desired 4. keep the `<think>...</think>` structure intact ## Example ```python { "messages": [ {"role": "user", "content": "Ken created a care package to send to his brother..."}, { "role": "assistant", "content": "<think>\nLet me work through this step by step.\n\n1. Box on scale...\n</think>\n\n16 pounds. Starting at 2 lbs, tripling gives 6 lbs..." } ], "source": "roman", "difficulty": "medium", "category": "simple logic and math", "example_id": "", "n_tokens": 380 } ``` ## How to load ```python from datasets import load_dataset ds = load_dataset("ykarout/Opus-4.6-reasoning-sft-12k") train_ds = ds["train"] validation_ds = ds["validation"] ``` ## Suggested usage with SFT ```python from datasets import load_dataset from trl import SFTTrainer, SFTConfig ds = load_dataset("ykarout/Opus-4.6-reasoning-sft-12k") trainer = SFTTrainer( model="Qwen/Qwen3-VL-8B", train_dataset=ds["train"], eval_dataset=ds["validation"], args=SFTConfig( output_dir="out", assistant_only_loss=True, ), ) trainer.train() ``` ## Notes - The dataset intentionally preserves explicit reasoning text. - A small number of incomplete or partially truncated upstream examples may remain. - In practice, these are rare and can also serve as a useful signal for handling incomplete inputs cautiously. - `n_tokens` should be treated as a helpful reference column tied to the tokenizer/template used during measurement. ## Attribution This dataset is derived from and would not exist without the original work by the source dataset creators: - `Roman1111111/claude-opus-4.6-10000x` - `Crownelius/Opus-4.6-Reasoning-3300x` Please give credit to the original dataset authors when using or redistributing derivatives. ## License and usage considerations This dataset is a processed derivative of upstream datasets. Please review the source dataset pages and their licenses / usage terms before commercial or large-scale downstream use.

提供机构：

ykarout

5,000+

优质数据集

54 个

任务类型

进入经典数据集