sklmindforge/llm_arithmetic_training
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sklmindforge/llm_arithmetic_training
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
size_categories:
- 1M<n<10M
---
# Parallax-CoT: 1GB Arithmetic Reasoning Dataset
## Overview
This dataset is designed for **Curriculum Learning** in Small Language Models (SLMs). It focuses on "weight hardening"—strengthening the internal attention mechanisms of models (specifically Parallax 0.5B) to prepare them for complex symbolic reasoning, code generation, and high-level mathematics (Calculus/Physics).
## Dataset Structure
The data follows a **Chain-of-Thought (CoT)** format wrapped in `<think>` tags.
- **Format:** JSONL
- **Total Size:** 1GB (~250M+ Tokens)
- **Operations:** Addition, Subtraction, Multiplication, Division.
- **Complexity:** Up to 6-digit integers with multi-step carries, borrows, and partial products.
## Purpose: "The Hardening Phase"
Unlike standard math datasets that focus on result accuracy, this dataset is built to:
1. **Develop Procedural Logic:** Forcing the model to predict the *process* before the *result*.
2. **Expand Context Handling:** Training the model to maintain state across long-form tokens.
3. **Bridge to Code:** Serving as a foundational step before fine-tuning on Python/Sandbox interaction.
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("sklmindforge/llm_arithmetic_training", streaming=True)
---
license: apache-2.0
---
提供机构:
sklmindforge



