prithivMLmods/Gacrux-Tiny-1M

Name: prithivMLmods/Gacrux-Tiny-1M
Creator: prithivMLmods
Published: 2025-11-26 04:28:21
License: 暂无描述

Hugging Face2025-11-26 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/prithivMLmods/Gacrux-Tiny-1M

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation - question-answering language: - en tags: - code-x - code - math - agent size_categories: - 1M<n<10M --- ![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/u4BHIlU7aiN6J6abaSHbJ.png) # **Gacrux-Tiny-1M** > **Gacrux-Tiny-1M** is a compact, high-quality reasoning dataset curated by **prithivMLmods**, containing **~1.06M chain-of-thought reasoning traces** optimized for mathematical problem solving, algorithmic coding challenges, and structured reasoning across competitive programming tasks. This dataset is ideal for lightweight reasoning model training and benchmarking. The dataset provides real structured problem statements with detailed reasoning step-by-step solutions that demonstrate problem-solving methods relevant for AI tutoring systems, reasoning LLMs, and code-based reasoning tasks. ## Quick Start ```bash pip install -U datasets ``` ```python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Gacrux-Tiny-1M", split="train") ``` ## Dataset Overview | Feature | Value | | ---------------- | ---------------------------------------- | | **Total Rows** | ~1,066,324 | | **Approx. Size** | 12.3 GB | | **Format** | Parquet | | **Language** | English | | **License** | Apache-2.0 | | **Domains** | Math, competitive programming, reasoning | | **Tags** | code-x, math, code, agent | ## Data Structure * **problem**: Task description from math, programming, and logic domains * **solution**: Chain-of-thought reasoning and final resolution ## Source Inputs Includes reasoning from: * **Xen-Arc AI CodeX-2M-Thinking**: [Small traces, depending on the specific problem] Code-x structured programming logic, [XenArcAI/CodeX-2M-Thinking](https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking) * **Math-aligned custom prompts** : [Gargantua-R1-Compact](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Compact) * **Hybrid algorithmic reasoning tasks**: [Gargantua-R1-Compact](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Compact) ## Ideal Use Cases * Fine-tuning small-to-mid scale reasoning models * LLM alignment on stepwise chain-of-thought reasoning * Competitive programming tutoring and explanation agents * Math problem solver model development * Code reasoning and debugging training frameworks ## Maintainer | Author | Last Updated | | --------------------------------------------------------- | ------------ | | **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **Nov 2025** |

提供机构：

prithivMLmods

5,000+

优质数据集

54 个

任务类型

进入经典数据集