five

Jongsim/claude-opus-4.6-reasoning-12k-en-filtered-v2

收藏
Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Jongsim/claude-opus-4.6-reasoning-12k-en-filtered-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 size_categories: - 10K<n<100K task_categories: - text-generation tags: - reasoning - chain-of-thought - filtered - claude - opus - claude-only dataset_info: features: - name: id dtype: string - name: source dtype: string - name: messages dtype: string - name: domain dtype: string - name: difficulty dtype: string - name: teacher_model dtype: string splits: - name: train num_examples: 12126 dataset_size: 11600000 configs: - config_name: default data_files: - split: train path: "*.parquet" --- # Claude Opus Reasoning 12K - English (Filtered v2, Claude-Only) A **strictly filtered** version of [Jongsim/claude-opus-4.6-reasoning-12k-en-filtered](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-en-filtered). This dataset contains **12,126 high-quality reasoning examples** generated exclusively by Claude Opus models. All non-Claude data (Qwen-generated) has been removed. ## What changed from Filtered v1 | Version | Entries | Description | |---------|---------|-------------| | Original | 12,842 | Raw merged dataset | | Filtered v1 | 12,757 | Refusal + empty response removal | | **Filtered v2** | **12,126** | v1 + Qwen data removed (Claude-only) | ### v2 Changes - Removed **631 entries** from `Jackrong/Qwen3.5-reasoning-700x` source - Dataset now contains **only Claude Opus-generated** reasoning data - All v1 quality filters still applied (refusal removal, empty response cleanup, broken translation recovery) ## Source Distribution | Source | Count | Percentage | |--------|-------|------------| | Roman1111111/claude-opus-4.6-10000x | 9,601 | 79.2% | | nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,276 | 18.8% | | TeichAI/claude-4.5-opus-high-reasoning-250x | 249 | 2.1% | ## Domain Distribution | Domain | Count | Percentage | |--------|-------|------------| | simple logic and math | 7,468 | 61.6% | | math | 4,143 | 34.2% | | code | 266 | 2.2% | > Note: `science` and `instruction_following` domains were entirely from Qwen source and have been removed. ## Difficulty Distribution | Difficulty | Count | |------------|-------| | medium | 11,749 | | phd | 69 | | hard | 59 | ## Schema | Column | Type | Description | |--------|------|-------------| | `id` | string | Unique identifier | | `source` | string | Original dataset source | | `messages` | string | JSON array of `{role, content}` pairs (user/assistant) | | `domain` | string | Task domain (math, code) | | `difficulty` | string | Difficulty level (medium, hard, phd) | | `teacher_model` | string | Model used to generate the response (`claude-opus-4.6`) | ## Filtering Process 1. **v1 filters**: Refusal detection (58), empty response removal (26), broken translation recovery (11) 2. **v2 filter**: Qwen-sourced data removal (631) — ensures pure Claude Opus reasoning only 3. **Model-based verification**: Gemma 4 26B validation (50 samples, 0 additional issues found) ## Usage ```python from datasets import load_dataset dataset = load_dataset("Jongsim/claude-opus-4.6-reasoning-12k-en-filtered-v2") ``` ## Related Datasets | Dataset | Language | Entries | Description | |---------|----------|---------|-------------| | [claude-opus-4.6-reasoning-12k](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k) | EN | 12,842 | Original | | [claude-opus-4.6-reasoning-12k-ko](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko) | KO | 12,842 | Korean translation | | [claude-opus-4.6-reasoning-12k-en-filtered](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-en-filtered) | EN | 12,757 | Filtered v1 | | [claude-opus-4.6-reasoning-12k-ko-filtered](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko-filtered) | KO | 12,757 | Filtered v1 Korean | | **claude-opus-4.6-reasoning-12k-en-filtered-v2** | **EN** | **12,126** | **Filtered v2 (this)** | | [claude-opus-4.6-reasoning-12k-ko-filtered-v2](https://huggingface.co/datasets/Jongsim/claude-opus-4.6-reasoning-12k-ko-filtered-v2) | KO | 12,126 | Filtered v2 Korean | ## License Apache 2.0
提供机构:
Jongsim
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作