AtlasUnified/atlas-math-sets-2.0
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AtlasUnified/atlas-math-sets-2.0
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
task_categories:
- text-generation
- question-answering
task_ids:
- explanation-generation
- open-book-qa
pretty_name: Atlas Math Sets
size_categories:
- 1K<n<10K
configs:
- config_name: default
data_files:
- split: train
path: data/train.jsonl
- split: validation
path: data/validation.jsonl
- split: test
path: data/test.jsonl
---
<p align="center">
<img src="./atlas-math-logo.png" alt="Atlas Math logo" width="320">
</p>
# Atlas Math Sets
Atlas Math Sets is a synthetic math-instruction dataset for training and evaluating models on short-form algebraic reasoning tasks. The dataset is designed around compact instruction-following examples where a model is given a natural-language prompt, a structured equation input, and a normalized target answer.
The current sample shown here focuses on solving simple linear equations with one variable and a labeled difficulty level.
## Dataset Summary
Each example contains:
- `instruction`: a natural-language task prompt
- `input`: the equation or math expression to solve
- `answer`: the normalized symbolic or numeric answer
- `answer_words`: the answer written in words
- `difficulty`: a difficulty label for curriculum-style filtering
This format makes the dataset useful for:
- supervised fine-tuning
- instruction tuning
- evaluation of algebraic reasoning
- curriculum learning by difficulty band
- answer normalization experiments
## Supported Tasks
- Solving one-variable linear equations
- Instruction-following for mathematical reasoning
- Short-form answer generation
- Difficulty-conditioned filtering and evaluation
## Languages
- English
## Dataset Structure
### Data Instances
Each record is a JSON object with the following schema:
```json
{
"instruction": "Solve the multi-step equation 3y + -4 = 8 - 0.",
"input": "3y + -4 = 8 - 0",
"answer": "4",
"answer_words": "four",
"difficulty": "level_1"
}
```
### Data Fields
#### `instruction`
Natural-language description of the math task.
Example:
```text
Solve the multi-step equation 3y + -4 = 8 - 0.
```
#### `input`
Structured equation string to be solved.
Example:
```text
3y + -4 = 8 - 0
```
#### `answer`
Canonical short answer, typically numeric.
Example:
```text
4
```
#### `answer_words`
Verbalized form of the answer.
Example:
```text
four
```
#### `difficulty`
Difficulty label for filtering, stratified evaluation, or curriculum training.
Example:
```text
level_1
```
## Example Records
```json
{"instruction": "Solve the multi-step equation 3y + -4 = 8 - 0.", "input": "3y + -4 = 8 - 0", "answer": "4", "answer_words": "four", "difficulty": "level_1"}
{"instruction": "Solve the multi-step equation 3x + 3 = 13 - -2.", "input": "3x + 3 = 13 - -2", "answer": "4", "answer_words": "four", "difficulty": "level_1"}
{"instruction": "Find the solution to -3x + 7 = 39 - -1.", "input": "-3x + 7 = 39 - -1", "answer": "-11", "answer_words": "minus eleven", "difficulty": "level_1"}
{"instruction": "Solve the multi-step equation -2y + 0 = 28 - 8.", "input": "-2y + 0 = 28 - 8", "answer": "-10", "answer_words": "minus ten", "difficulty": "level_1"}
{"instruction": "Find the solution to -2y + 9 = -3 - -4.", "input": "-2y + 9 = -3 - -4", "answer": "4", "answer_words": "four", "difficulty": "level_1"}
```
## Splits
Recommended split structure:
- `train`
- `validation`
- `test`
If your repository currently uses a single file, this card can still be published as-is and updated once explicit split files are added.
## Dataset Creation
### Source Data
This dataset appears to be synthetically generated or programmatically constructed from equation templates. The examples are highly regular in structure and use normalized field formatting suitable for automated generation pipelines.
### Curation Rationale
The goal is to provide a clean, machine-readable corpus for algebra instruction tuning and evaluation. The paired `answer` and `answer_words` fields support experiments in answer formatting, verbalization, and robust decoding.
## Intended Uses
### Direct Use
- Fine-tuning instruction-following models on algebra tasks
- Benchmarking symbolic accuracy on simple equation solving
- Filtering by difficulty for staged training
- Comparing numeric and verbalized answer generation
### Out-of-Scope Use
This dataset should not be treated as a comprehensive benchmark for advanced mathematics. It appears focused on narrow algebraic patterns and short-answer response formats.
## Limitations
- Likely synthetic rather than naturally occurring educational data
- Limited task diversity in the current sample
- Difficulty labels may reflect generation rules rather than human judgment
- Small answer space may inflate performance for some model classes
- Does not capture full reasoning traces unless chain-of-thought fields are added separately
## Bias, Risks, and Safety
This dataset is low risk compared with open-domain corpora, but users should still be aware of the following:
- Synthetic task distributions may not match real student errors or natural math phrasing
- Models trained on templated equations may overfit formatting patterns
- Strong benchmark performance on this dataset may not transfer to broader mathematical reasoning
## Recommended Evaluation
Useful metrics include:
- exact match on `answer`
- normalized exact match after whitespace and sign cleanup
- accuracy by `difficulty`
- agreement between `answer` and generated verbalized answer
## Training Example
```python
from datasets import load_dataset
# Local JSONL files
# dataset = load_dataset("json", data_files={
# "train": "data/train.jsonl",
# "validation": "data/validation.jsonl",
# "test": "data/test.jsonl",
# })
# Hugging Face Hub
# dataset = load_dataset("AtlasUnified/atlas-math-sets")
```
## Prompting Example
```python
example = {
"instruction": "Solve the multi-step equation 2x + -3 = 14 - -1.",
"input": "2x + -3 = 14 - -1",
"answer": "9",
"answer_words": "nine",
"difficulty": "level_1"
}
prompt = f"Instruction: {example['instruction']}\nInput: {example['input']}\nAnswer:"
print(prompt)
```
## Suggested Repository Layout
```text
atlas-math-sets/
├── README.md
├── data/
│ ├── train.jsonl
│ ├── validation.jsonl
│ └── test.jsonl
└── LICENSE
```
## Citation
If you use this dataset, cite the repository or dataset page associated with Atlas Math Sets. If you want a formal BibTeX citation, add it here once publication metadata is finalized.
```bibtex
@dataset{atlas_math_sets,
title = {Atlas Math Sets},
author = {AtlasUnified},
year = {2026},
note = {Hugging Face dataset}
}
```
## License
MIT
提供机构:
AtlasUnified



