Jongsim/claude-opus-4.6-reasoning-12k-en-filtered-v2
收藏Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Jongsim/claude-opus-4.6-reasoning-12k-en-filtered-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
size_categories:
- 10K<n<100K
task_categories:
- text-generation
tags:
- reasoning
- chain-of-thought
- filtered
- claude
- opus
- claude-only
dataset_info:
features:
- name: id
dtype: string
- name: source
dtype: string
- name: messages
dtype: string
- name: domain
dtype: string
- name: difficulty
dtype: string
- name: teacher_model
dtype: string
splits:
- name: train
num_examples: 12126
dataset_size: 11600000
configs:
- config_name: default
data_files:
- split: train
path: "*.parquet"
---
# Claude Opus Reasoning 12K - English (Filtered v2, Claude-Only)
A **strictly filtered** version of [Jongsim/claude-opus-4.6-reasoning-12k-en-filtered](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-en-filtered).
This dataset contains **12,126 high-quality reasoning examples** generated exclusively by Claude Opus models. All non-Claude data (Qwen-generated) has been removed.
## What changed from Filtered v1
| Version | Entries | Description |
|---------|---------|-------------|
| Original | 12,842 | Raw merged dataset |
| Filtered v1 | 12,757 | Refusal + empty response removal |
| **Filtered v2** | **12,126** | v1 + Qwen data removed (Claude-only) |
### v2 Changes
- Removed **631 entries** from `Jackrong/Qwen3.5-reasoning-700x` source
- Dataset now contains **only Claude Opus-generated** reasoning data
- All v1 quality filters still applied (refusal removal, empty response cleanup, broken translation recovery)
## Source Distribution
| Source | Count | Percentage |
|--------|-------|------------|
| Roman1111111/claude-opus-4.6-10000x | 9,601 | 79.2% |
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,276 | 18.8% |
| TeichAI/claude-4.5-opus-high-reasoning-250x | 249 | 2.1% |
## Domain Distribution
| Domain | Count | Percentage |
|--------|-------|------------|
| simple logic and math | 7,468 | 61.6% |
| math | 4,143 | 34.2% |
| code | 266 | 2.2% |
> Note: `science` and `instruction_following` domains were entirely from Qwen source and have been removed.
## Difficulty Distribution
| Difficulty | Count |
|------------|-------|
| medium | 11,749 |
| phd | 69 |
| hard | 59 |
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `id` | string | Unique identifier |
| `source` | string | Original dataset source |
| `messages` | string | JSON array of `{role, content}` pairs (user/assistant) |
| `domain` | string | Task domain (math, code) |
| `difficulty` | string | Difficulty level (medium, hard, phd) |
| `teacher_model` | string | Model used to generate the response (`claude-opus-4.6`) |
## Filtering Process
1. **v1 filters**: Refusal detection (58), empty response removal (26), broken translation recovery (11)
2. **v2 filter**: Qwen-sourced data removal (631) — ensures pure Claude Opus reasoning only
3. **Model-based verification**: Gemma 4 26B validation (50 samples, 0 additional issues found)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("Jongsim/claude-opus-4.6-reasoning-12k-en-filtered-v2")
```
## Related Datasets
| Dataset | Language | Entries | Description |
|---------|----------|---------|-------------|
| [claude-opus-4.6-reasoning-12k](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k) | EN | 12,842 | Original |
| [claude-opus-4.6-reasoning-12k-ko](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko) | KO | 12,842 | Korean translation |
| [claude-opus-4.6-reasoning-12k-en-filtered](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-en-filtered) | EN | 12,757 | Filtered v1 |
| [claude-opus-4.6-reasoning-12k-ko-filtered](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko-filtered) | KO | 12,757 | Filtered v1 Korean |
| **claude-opus-4.6-reasoning-12k-en-filtered-v2** | **EN** | **12,126** | **Filtered v2 (this)** |
| [claude-opus-4.6-reasoning-12k-ko-filtered-v2](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko-filtered-v2) | KO | 12,126 | Filtered v2 Korean |
## License
Apache 2.0
提供机构:
Jongsim



