Gacrux-Tiny-1M
收藏魔搭社区2025-12-03 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Gacrux-Tiny-1M
下载链接
链接失效反馈官方服务:
资源简介:

# **Gacrux-Tiny-1M**
> **Gacrux-Tiny-1M** is a compact, high-quality reasoning dataset curated by **prithivMLmods**, containing **~1.06M chain-of-thought reasoning traces** optimized for mathematical problem solving, algorithmic coding challenges, and structured reasoning across competitive programming tasks. This dataset is ideal for lightweight reasoning model training and benchmarking. The dataset provides real structured problem statements with detailed reasoning step-by-step solutions that demonstrate problem-solving methods relevant for AI tutoring systems, reasoning LLMs, and code-based reasoning tasks.
## Quick Start
```bash
pip install -U datasets
```
```python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Gacrux-Tiny-1M", split="train")
```
## Dataset Overview
| Feature | Value |
| ---------------- | ---------------------------------------- |
| **Total Rows** | ~1,066,324 |
| **Approx. Size** | 12.3 GB |
| **Format** | Parquet |
| **Language** | English |
| **License** | Apache-2.0 |
| **Domains** | Math, competitive programming, reasoning |
| **Tags** | code-x, math, code, agent |
## Data Structure
* **problem**: Task description from math, programming, and logic domains
* **solution**: Chain-of-thought reasoning and final resolution
## Source Inputs
Includes reasoning from:
* **Xen-Arc AI CodeX-2M-Thinking**: [Small traces, depending on the specific problem] Code-x structured programming logic, [XenArcAI/CodeX-2M-Thinking](https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking)
* **Math-aligned custom prompts** : [Gargantua-R1-Compact](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Compact)
* **Hybrid algorithmic reasoning tasks**: [Gargantua-R1-Compact](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Compact)
## Ideal Use Cases
* Fine-tuning small-to-mid scale reasoning models
* LLM alignment on stepwise chain-of-thought reasoning
* Competitive programming tutoring and explanation agents
* Math problem solver model development
* Code reasoning and debugging training frameworks
## Maintainer
| Author | Last Updated |
| --------------------------------------------------------- | ------------ |
| **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **Nov 2025** |
# **Gacrux-Tiny-1M**

> **Gacrux-Tiny-1M** 是由 **prithivMLmods** 精心整理的轻量化高质量推理数据集,包含约106万条思维链(Chain-of-Thought)推理轨迹,针对数学问题求解、算法编程挑战以及程序设计竞赛中的结构化推理任务进行了优化。该数据集非常适用于轻量化推理模型的训练与基准测试。本数据集提供真实的结构化问题描述,附带逐步骤的详细推理解决方案,展示了适用于AI辅导系统、推理型大语言模型(Large Language Model,LLM)以及基于代码的推理任务的问题求解方法。
## 快速开始
bash
pip install -U datasets
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Gacrux-Tiny-1M", split="train")
## 数据集概览
| 特征项 | 参数值 |
| ---------------- | ---------------------------------------- |
| **总数据行数** | 约1,066,324 |
| **近似总大小** | 12.3 GB |
| **数据格式** | Parquet |
| **语言** | 英语 |
| **开源协议** | Apache-2.0 |
| **应用领域** | 数学、程序设计竞赛、推理 |
| **标签** | code-x, math, code, agent |
## 数据结构
* **problem**:来自数学、编程与逻辑领域的任务描述
* **solution**:思维链推理过程与最终结果
## 数据来源
包含以下来源的推理数据:
* **Xen-Arc AI CodeX-2M-Thinking**:[小型推理轨迹,视具体任务而定] 基于代码的结构化编程逻辑,数据集链接:[XenArcAI/CodeX-2M-Thinking](https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking)
* **数学对齐自定义提示词**:数据集链接:[Gargantua-R1-Compact](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Compact)
* **混合算法推理任务**:数据集链接:[Gargantua-R1-Compact](https://huggingface.co/datasets/prithivMLmods/Gargantua-R1-Compact)
## 理想应用场景
* 中小规模推理模型的微调
* 面向逐步骤思维链推理的大语言模型对齐
* 程序设计竞赛辅导与解释型AI智能体(AI Agent)
* 数学问题求解模型开发
* 代码推理与调试训练框架
## 维护者
| 作者 | 最后更新时间 |
| --------------------------------------------------------- | ------------ |
| **[prithivMLmods](https://huggingface.co/prithivMLmods)** | **2025年11月** |
提供机构:
maas
创建时间:
2025-11-27



