Verdugie/opus-4.6-training-catalog
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Verdugie/opus-4.6-training-catalog
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- claude
- opus-4.6
- reasoning
- coding
- conversation
- distillation
- synthetic
size_categories:
- 100K<n<1M
---
# Opus 4.6 Community Training Catalog
A curated, cleaned, and deduplicated collection of community-created Claude Opus 4.6 distillation datasets from HuggingFace. Contains reasoning traces, coding examples, and conversational data generated by `claude-opus-4-6`.
## Dataset Summary
| Metric | Value |
|--------|-------|
| **Total conversations** | 168,301 |
| **Format** | JSONL — ShareGPT-style (messages array with role/content) |
| **Size** | ~154 MB |
| **Language** | English |
## Splits
| Split | Rows | Size | Description |
|-------|------|------|-------------|
| `reasoning.jsonl` | 166,698 | 133 MB | Reasoning, math, logic, chain-of-thought traces |
| `coding.jsonl` | 662 | 19 MB | Programming, software engineering, high-reasoning coding |
| `conversation.jsonl` | 941 | 1.8 MB | Relational conversation, stance distillation |
## Source Datasets (6 verified)
Every dataset was verified by reading its HuggingFace README to confirm Claude Opus 4.6 was explicitly stated as the generation model.
| # | Source | Rows | Split | License |
|---|--------|------|-------|---------|
| 1 | [owenisas/opus46-reasoning-mix-full](https://huggingface.co/datasets/owenisas/opus46-reasoning-mix-full) | 156,293 | reasoning | Unspecified |
| 2 | [Roman1111111/claude-opus-4.6-10000x](https://huggingface.co/datasets/Roman1111111/claude-opus-4.6-10000x) | 7,464 | reasoning | MIT |
| 3 | [LEGENDQ/Claude-Opus-4.6-Reasoning-Dataset](https://huggingface.co/datasets/LEGENDQ/Claude-Opus-4.6-Reasoning-Dataset) | 2,056 | reasoning | Apache-2.0 |
| 4 | [TeichAI/Claude-Opus-4.6-Reasoning-927x](https://huggingface.co/datasets/TeichAI/Claude-Opus-4.6-Reasoning-927x) | 885 | reasoning | Apache-2.0 |
| 5 | [dalisoft/claude-opus-4.6-high-reasoning-700x](https://huggingface.co/datasets/dalisoft/claude-opus-4.6-high-reasoning-700x) | 662 | coding | Apache-2.0 |
| 6 | [aptgetupdate/Claude-Opus-4.6-stance-distilled-RELATIONAL](https://huggingface.co/datasets/aptgetupdate/Claude-Opus-4.6-stance-distilled-RELATIONAL) | 941 | conversation | CC-BY-SA-4.0 |
## Data Format
Each row is a JSON object with a unified schema:
```json
{
"source": "owenisas",
"topic": "reasoning",
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
]
}
```
The `source` field identifies the original dataset creator. The `topic` field indicates the content category. Messages follow the standard ShareGPT format compatible with most fine-tuning frameworks (Axolotl, Unsloth, LLaMA-Factory, etc.).
## Cleaning Methodology
1. **Collection**: Exhaustive HuggingFace Hub search using queries: `claude opus 4.6`, `opus-4-6`, `claude-opus-4-6`, `opus46`, etc.
2. **Verification**: Each dataset's README manually checked to confirm Opus 4.6 as the generation model. 17 candidates rejected (forks, mixed-model, unconfirmed, or duplicates).
3. **Format normalization**: All datasets converted to a unified `{source, topic, messages}` schema.
4. **Deduplication**: Cross-dataset deduplication to remove exact and near-duplicate conversations.
5. **Quality filtering**: Empty rows, malformed messages, and garbage content removed.
## Intended Use
Fine-tuning and distillation of open-source language models on high-quality Opus 4.6 reasoning traces, coding examples, and conversational patterns. Suitable for LoRA/QLoRA training on 7B–72B parameter models targeting improved reasoning and instruction-following.
## Limitations
- All data is synthetically generated by Claude Opus 4.6. It inherits any biases or limitations of the source model.
- The dataset is English-only.
- Licensing varies by source dataset — check individual source licenses for specific use cases.
## Citation
Curated by [Verdugie](https://huggingface.co/Verdugie). Original data created by the respective dataset authors listed above.
提供机构:
Verdugie



