Jongsim/claude-opus-4.6-reasoning-12k-en-filtered
收藏Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Jongsim/claude-opus-4.6-reasoning-12k-en-filtered
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
size_categories:
- 10K<n<100K
task_categories:
- text-generation
tags:
- reasoning
- chain-of-thought
- filtered
- claude
- opus
dataset_info:
features:
- name: id
dtype: string
- name: source
dtype: string
- name: messages
dtype: string
- name: domain
dtype: string
- name: difficulty
dtype: string
- name: teacher_model
dtype: string
splits:
- name: train
num_examples: 12757
dataset_size: 24010000
configs:
- config_name: default
data_files:
- split: train
path: "*.parquet"
---
# Claude Opus Reasoning 12K - English (Filtered)
A **filtered and quality-assured** version of [Jongsim/claude-opus-4.6-reasoning-12k](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k).
This dataset contains **12,757 high-quality reasoning examples** generated by Claude Opus, with refusal responses removed and broken entries cleaned up.
## Filtering Summary
| Change | Count | Description |
|--------|-------|-------------|
| Refusal responses removed | 58 | AI responses refusing requests, citing safety guidelines, or declining tasks |
| Empty/broken responses removed | 26 | Entries with missing or extremely short assistant responses |
| **Total removed** | **85** | |
| **Final dataset** | **12,757** | From original 12,842 |
## Dataset Details
### Source Distribution
| Source | Count | Percentage |
|--------|-------|------------|
| Roman1111111/claude-opus-4.6-10000x | 9,601 | 75.3% |
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,276 | 17.8% |
| Jackrong/Qwen3.5-reasoning-700x | 631 | 4.9% |
| TeichAI/claude-4.5-opus-high-reasoning-250x | 249 | 2.0% |
### Domain Distribution
| Domain | Count | Percentage |
|--------|-------|------------|
| simple logic and math | 7,468 | 58.5% |
| math | 4,372 | 34.3% |
| code | 364 | 2.9% |
| science | 166 | 1.3% |
| instruction_following | 138 | 1.1% |
### Difficulty Distribution
| Difficulty | Count |
|------------|-------|
| medium | 11,749 |
| phd | 69 |
| hard | 59 |
### Schema
| Column | Type | Description |
|--------|------|-------------|
| `id` | string | Unique identifier |
| `source` | string | Original dataset source |
| `messages` | string | JSON array of `{role, content}` pairs (user/assistant) |
| `domain` | string | Task domain (math, code, science, etc.) |
| `difficulty` | string | Difficulty level (medium, hard, phd) |
| `teacher_model` | string | Model used to generate the response (`claude-opus-4.6`) |
### Filtering Process
1. **Pattern-based refusal detection** — Regex patterns for both EN and KO to detect AI refusal/rejection language
2. **Empty response detection** — Entries where assistant content is missing or under 10 characters
3. **Model-based verification** — 50 random samples verified by Gemma 4 26B model (0 additional refusals found)
4. **Translation quality check** — 30 random EN-KO pairs verified by Gemma 4 26B (all passed)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("Jongsim/claude-opus-4.6-reasoning-12k-en-filtered")
```
## Related Datasets
- [Jongsim/claude-opus-4.6-reasoning-12k](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k) — Original English (12,842)
- [Jongsim/claude-opus-4.6-reasoning-12k-ko](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko) — Korean translation (12,842)
- [Jongsim/claude-opus-4.6-reasoning-12k-ko-filtered](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko-filtered) — Filtered Korean (12,757)
## License
Apache 2.0
提供机构:
Jongsim



