five

vanty120/Gpt-5.4-Xhigh-Reasoning-2000x

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/vanty120/Gpt-5.4-Xhigh-Reasoning-2000x
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - question-answering - text-generation size_categories: - 1K<n<10K tags: - reasoning - math - code - science - distillation - chain-of-thought - gpt-5.4 - gemini-3.1-pro - thinking - sft - hard-reasoning pretty_name: Gpt-5.4-Xhigh-Reasoning-2750x --- # Gpt-5.4-Xhigh-Reasoning-2750x A premium-quality reasoning dataset containing **2,752 elite samples** distilled from **GPT-5.4 XHIGH** (the highest reasoning effort tier of GPT-5.4). Each sample features deep, multi-step Chain-of-Thought traces that are significantly longer and more rigorous than standard GPT-5.4 outputs. This dataset is specifically designed for **Supervised Fine-Tuning (SFT)** to transform general-purpose language models into powerful reasoning models with explicit thinking capabilities. ## Dataset Summary | Property | Value | |---|---| | **Total Samples** | 2,752 | | **Teacher Model** | GPT-5.4 XHIGH (Maximum Reasoning Effort) | | **Seed Data** | [Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) + [Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) + [gemini-3.1-pro-hard-high-reasoning](https://huggingface.co/datasets/Roman1111111/gemini-3.1-pro-hard-high-reasoning) | | **Language** | English | | **Domains** | Mathematics, Code, Science, STEM, Security, Economics, and 60+ expert-level domains | | **Avg. Thinking Length** | ~12,600 characters per sample | ### Why XHIGH? GPT-5.4 supports multiple reasoning effort levels. **XHIGH** is the maximum tier, which forces the model to allocate significantly more compute to its internal chain-of-thought before producing a final answer. This results in: - **Deeper logical decomposition** compared to standard GPT-5.4 outputs - **More self-correction steps** within the reasoning trace - **Higher accuracy** on complex multi-step problems ## Seed Data Sources ### Source 1: Standard Reasoning (2,007 samples) High-quality prompts sourced from [Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) and [Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered), covering math, code, science, and instruction-following tasks. ### Source 2: Gemini 3.1 Pro Hard Reasoning (745 samples) Ultra-hard prompts sourced from [Roman1111111/gemini-3.1-pro-hard-high-reasoning](https://huggingface.co/datasets/Roman1111111/gemini-3.1-pro-hard-high-reasoning). These prompts were originally generated by an agentic workflow (Gemini 3 Flash as orchestrator) using the following high-intensity system instruction: > **System Instruction**: Act as a "Super-Intelligence Evaluator". Generate distinct, complex but solvable prompts that require extreme logic. > > **Requirements**: > 1. **Difficulty**: The question must be unsolvable by simple retrieval. It requires multi-step logic, derivation, or synthesis of conflicting information. > 2. **Concept**: Pick a specific, niche concept within the target domain. > 3. **Prompt Text**: The user prompt should be detailed (code snippets, math proofs, or philosophical paradoxes). > 4. **No Fluff**: Go straight to the hard part. This source spans 60+ expert-level domains including: - **Physics**: QFT, General Relativity, Condensed Matter, Thermodynamics - **Math**: Algebraic Topology, Analytic Number Theory, Category Theory - **Coding/CS**: CUDA/OpenCL HPC, Database Internals, LLVM IR, ZK-Proofs - **Biology/Med**: CRISPR Off-target Analysis, Protein Folding, Pharmacokinetics - **Security**: Prompt Injection Defense, Cryptanalysis, Side-channel Attacks - **Strategic Logic**: Game Theory, Supply Chain Crisis Modeling, Urban Planning - **Benchmarks**: ARC-AGI, LiveCodeBench v6, TheoremQA, MathVista ## Domain Distribution | Category | Count | Percentage | |---|---|---| | Mathematics | 1,581 | 57.4% | | Code | 174 | 6.3% | | Science | 136 | 4.9% | | Instruction Following | 116 | 4.2% | | Prompt Injection & Jailbreak Defense | 70 | 2.5% | | Algebraic Topology | 65 | 2.4% | | Bioinformatics Algorithms | 43 | 1.6% | | Computational Chemistry (DFT) | 41 | 1.5% | | Other Expert Domains (60+) | 526 | 19.1% | ## Difficulty Distribution | Difficulty | Count | Description | |---|---|---| | Medium | 1,986 | Undergraduate level | | Hard | 188 | Professional / competition level | | Extreme | 111 | Research frontier | | Expert | 124 | PhD-level, research-grade problems | | Advanced+ | 343 | Advanced, Super-Intelligence, Graduate, Olympiad | ## Dataset Structure Each sample contains the following fields: ```json { "category": "Algebraic Topology", "difficulty": "Extreme", "instruction": "The original question or problem statement...", "thinking": "Full chain-of-thought reasoning trace from GPT-5.4 XHIGH...", "response": "The final, polished answer..." } ``` | Field | Description | |---|---| | `category` | Domain classification (60+ categories) | | `difficulty` | Difficulty tier: `medium`, `hard`, `extreme`, `expert`, `advanced`, etc. | | `instruction` | The original problem or question | | `thinking` | Complete reasoning trace (Chain-of-Thought) from GPT-5.4 XHIGH | | `response` | Final solution / answer | ## Generation Pipeline 1. **Seed Selection**: High-quality prompts sourced from three complementary datasets covering standard reasoning (math, code, science) and ultra-hard expert-level domains (60+ fields). 2. **Distillation**: Each prompt was processed through **GPT-5.4** with `reasoning_effort=xhigh`, extracting both the internal reasoning trace and the final output. 3. **Quality Control**: Samples with empty thinking or responses were filtered out. Prompt injection artifacts were cleaned from the input. ### Training Format (ChatML with Thinking) ``` <|im_start|>system You are a helpful assistant that thinks step-by-step.<|im_end|> <|im_start|>user {instruction}<|im_end|> <|im_start|>assistant <thinking> {thinking} </thinking> {response}<|im_end|> ``` ## Disclaimers - **LLM Hallucinations**: While GPT-5.4 XHIGH produces highly rigorous outputs, a small number of reasoning errors may still exist. Sample inspection before fine-tuning is recommended. - **Expert Verification**: The hard-reasoning subset contains solutions so technical that they may require subject-matter experts (PhDs) to verify accuracy. - **License**: This dataset is released under the Apache 2.0 license. Usage must comply with [OpenAI's Terms of Service](https://openai.com/policies/terms-of-use). ## Credits - **Teacher Model**: [GPT-5.4](https://openai.com/gpt-5) by OpenAI (XHIGH reasoning effort) - **Seed Datasets**: - [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) (Alibaba-Superior-Reasoning-Stage2) - [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) - [Roman1111111/gemini-3.1-pro-hard-high-reasoning](https://huggingface.co/datasets/Roman1111111/gemini-3.1-pro-hard-high-reasoning) - **Distillation Pipeline**: Built by [vanty120](https://huggingface.co/vanty120)
提供机构:
vanty120
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作