sklmindforge/llm_addition_training
收藏Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sklmindforge/llm_addition_training
下载链接
链接失效反馈官方服务:
资源简介:
# LLM Addition Training Dataset (Odometer-Style Logic)
## Overview
This dataset is designed to teach Large Language Models (LLMs) the foundational logic of addition through **Chain-of-Thought (CoT)** and **Place-Value Expansion**. Instead of simple $A + B = C$ pairs, this dataset forces the model to "think" through the process of splitting numbers into their constituent parts (units, tens, hundreds, thousands) and adding them step-by-step.
## Dataset Structure
The dataset consists of approximately 60,000 examples across three complexity tiers:
1. **Foundation Tier (1-20):** Direct recall for small-number addition.
2. **Expansion Tier (21-150):** Introduction to splitting tens and units.
3. **Odometer Tier (151-99,999):** Multi-digit addition using recursive place-value logic.
### Format
Each entry follows a consistent text completion format:
- **Question:** The addition problem.
- **Think:** The logical breakdown (e.g., "Split 532 into 500 then 30 then 2").
- **Result:** The final verified sum.
## Intended Use
This dataset is ideal for:
- **Continued Pre-training:** Injecting arithmetic stability into small models (0.1B - 3B parameters).
- **Fine-Tuning:** Teaching a model to follow a specific "scratchpad" reasoning format.
- **Arithmetic Benchmarking:** Testing if a model can handle multi-digit carry-over logic.
## Logic Example
**Question:** 53242 + 123
**Think:** Split 123 into 100 then 20 then 3. 53242 + 100 = 53342 -> 53342 + 20 = 53362 -> 53362 + 3 = 53365
**Result:** 53365
## License
MIT
提供机构:
sklmindforge



