ykarout/Opus-4.6-reasoning-sft-12k
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ykarout/Opus-4.6-reasoning-sft-12k
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Opus-4.6-reasoning-sft-12k
language:
- en
license: other
tags:
- reasoning
- chain-of-thought
- sft
- conversational
- qwen
- synthetic
- math
- unsloth
- opus
- claude
- openclaw
- qwen3.5
task_categories:
- text-generation
- question-answering
- reinforcement-learning
size_categories:
- 10K<n<100K
---
# Opus-4.6-reasoning-sft-12k
A unified conversational reasoning dataset built by combining and normalizing two Hugging Face datasets into a single training-ready schema for supervised fine-tuning.
## Overview
This dataset was created to support reasoning-focused SFT for chat models, especially Qwen-family conversational models and derivatives.
It unifies two source datasets into one consistent format:
- `Roman1111111/claude-opus-4.6-10000x`
- `Crownelius/Opus-4.6-Reasoning-3300x`
The final dataset is structured around a canonical `messages` field so it can be used directly in modern TRL / Unsloth SFT workflows.
## Why this dataset exists
The source datasets were useful but not fully aligned out of the box:
- one dataset used raw reasoning fields like `problem`, `thinking`, and `solution`
- the other used chat-style `messages`, but stored reasoning separately inside the assistant message
- one also included a repeated generic system prompt that was not useful to keep for every example
This dataset solves that by normalizing everything into one consistent conversational format with explicit reasoning preserved in the assistant reply.
## Source datasets
### 1) Roman1111111/claude-opus-4.6-10000x
Used as a conversational reasoning source.
Original structure included:
- `messages`
- `metadata`
Typical rows contained:
- a generic `system` message
- a `user` message
- an `assistant` message with:
- `content`
- `reasoning`
### 2) Crownelius/Opus-4.6-Reasoning-3300x
Used as a reasoning distillation source.
Original structure included fields such as:
- `problem`
- `thinking`
- `solution`
- `difficulty`
- `category`
- `id`
## Processing and enhancements
The following normalization and enhancement steps were applied.
### Crownelius normalization
Rows were converted from raw fields into chat conversations:
- `problem` -> user message
- `thinking` -> assistant reasoning
- `solution` -> assistant final answer
The assistant response was rewritten into the format:
```text
<think>
...
</think>
final answer
```
### Roman normalization
Rows were converted from the original chat-like format into the same canonical structure.
Enhancements:
- removed the repeated generic system prompt:
- `You are a helpful AI assistant.`
- preserved the user message
- rebuilt the assistant message from:
- `assistant.reasoning`
- `assistant.content`
The assistant response was normalized into:
```text
<think>
...
</think>
final answer
```
### Unified schema
Both datasets were normalized into the same final structure:
```python
{
"messages": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "<think>\n...\n</think>\n\nfinal answer"}
],
"source": "...",
"difficulty": "...",
"category": "...",
"example_id": "...",
"n_tokens": 123
}
```
## Final dataset columns
- `messages`
Canonical conversational training format.
- `source`
Origin of the sample:
- `roman`
- `crownelius`
- `difficulty`
Difficulty label from upstream data when available.
- `category`
Category label from upstream data when available.
- `example_id`
Original id when available.
- `n_tokens`
Token count measured after rendering the conversation with the target tokenizer chat template.
## Token length profile
Token lengths were measured after rendering the normalized conversations with the target tokenizer chat template.
Combined dataset statistics:
- **count:** 11,791
- **p50:** 255
- **p90:** 922
- **p95:** 1141
- **p99:** 1805
- **max:** 7569
This makes the dataset very practical for training with an `8192` context window.
## Applicability
This dataset is well suited for:
- supervised fine-tuning of reasoning-capable chat models
- preserving explicit reasoning traces during SFT
- training models to answer with both intermediate reasoning and a final answer
- math, logic, QA, short analytical tasks, and structured problem solving
- Qwen-family chat models and compatible conversational SFT pipelines
It is especially useful when you want a `messages`-based dataset that can be fed directly into:
- TRL `SFTTrainer`
- Unsloth conversational SFT workflows
- chat-template-aware training pipelines
## Training format recommendation
Use the `messages` column as the canonical source format.
Recommended approach:
1. load the dataset
2. let the model tokenizer apply its own chat template
3. train on assistant messages only if desired
4. keep the `<think>...</think>` structure intact
## Example
```python
{
"messages": [
{"role": "user", "content": "Ken created a care package to send to his brother..."},
{
"role": "assistant",
"content": "<think>\nLet me work through this step by step.\n\n1. Box on scale...\n</think>\n\n16 pounds. Starting at 2 lbs, tripling gives 6 lbs..."
}
],
"source": "roman",
"difficulty": "medium",
"category": "simple logic and math",
"example_id": "",
"n_tokens": 380
}
```
## How to load
```python
from datasets import load_dataset
ds = load_dataset("ykarout/Opus-4.6-reasoning-sft-12k")
train_ds = ds["train"]
validation_ds = ds["validation"]
```
## Suggested usage with SFT
```python
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
ds = load_dataset("ykarout/Opus-4.6-reasoning-sft-12k")
trainer = SFTTrainer(
model="Qwen/Qwen3-VL-8B",
train_dataset=ds["train"],
eval_dataset=ds["validation"],
args=SFTConfig(
output_dir="out",
assistant_only_loss=True,
),
)
trainer.train()
```
## Notes
- The dataset intentionally preserves explicit reasoning text.
- A small number of incomplete or partially truncated upstream examples may remain.
- In practice, these are rare and can also serve as a useful signal for handling incomplete inputs cautiously.
- `n_tokens` should be treated as a helpful reference column tied to the tokenizer/template used during measurement.
## Attribution
This dataset is derived from and would not exist without the original work by the source dataset creators:
- `Roman1111111/claude-opus-4.6-10000x`
- `Crownelius/Opus-4.6-Reasoning-3300x`
Please give credit to the original dataset authors when using or redistributing derivatives.
## License and usage considerations
This dataset is a processed derivative of upstream datasets. Please review the source dataset pages and their licenses / usage terms before commercial or large-scale downstream use.
提供机构:
ykarout



