five

AtlasUnified/atlas-math-sets-2.0

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AtlasUnified/atlas-math-sets-2.0
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit task_categories: - text-generation - question-answering task_ids: - explanation-generation - open-book-qa pretty_name: Atlas Math Sets size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: data/train.jsonl - split: validation path: data/validation.jsonl - split: test path: data/test.jsonl --- <p align="center"> <img src="./atlas-math-logo.png" alt="Atlas Math logo" width="320"> </p> # Atlas Math Sets Atlas Math Sets is a synthetic math-instruction dataset for training and evaluating models on short-form algebraic reasoning tasks. The dataset is designed around compact instruction-following examples where a model is given a natural-language prompt, a structured equation input, and a normalized target answer. The current sample shown here focuses on solving simple linear equations with one variable and a labeled difficulty level. ## Dataset Summary Each example contains: - `instruction`: a natural-language task prompt - `input`: the equation or math expression to solve - `answer`: the normalized symbolic or numeric answer - `answer_words`: the answer written in words - `difficulty`: a difficulty label for curriculum-style filtering This format makes the dataset useful for: - supervised fine-tuning - instruction tuning - evaluation of algebraic reasoning - curriculum learning by difficulty band - answer normalization experiments ## Supported Tasks - Solving one-variable linear equations - Instruction-following for mathematical reasoning - Short-form answer generation - Difficulty-conditioned filtering and evaluation ## Languages - English ## Dataset Structure ### Data Instances Each record is a JSON object with the following schema: ```json { "instruction": "Solve the multi-step equation 3y + -4 = 8 - 0.", "input": "3y + -4 = 8 - 0", "answer": "4", "answer_words": "four", "difficulty": "level_1" } ``` ### Data Fields #### `instruction` Natural-language description of the math task. Example: ```text Solve the multi-step equation 3y + -4 = 8 - 0. ``` #### `input` Structured equation string to be solved. Example: ```text 3y + -4 = 8 - 0 ``` #### `answer` Canonical short answer, typically numeric. Example: ```text 4 ``` #### `answer_words` Verbalized form of the answer. Example: ```text four ``` #### `difficulty` Difficulty label for filtering, stratified evaluation, or curriculum training. Example: ```text level_1 ``` ## Example Records ```json {"instruction": "Solve the multi-step equation 3y + -4 = 8 - 0.", "input": "3y + -4 = 8 - 0", "answer": "4", "answer_words": "four", "difficulty": "level_1"} {"instruction": "Solve the multi-step equation 3x + 3 = 13 - -2.", "input": "3x + 3 = 13 - -2", "answer": "4", "answer_words": "four", "difficulty": "level_1"} {"instruction": "Find the solution to -3x + 7 = 39 - -1.", "input": "-3x + 7 = 39 - -1", "answer": "-11", "answer_words": "minus eleven", "difficulty": "level_1"} {"instruction": "Solve the multi-step equation -2y + 0 = 28 - 8.", "input": "-2y + 0 = 28 - 8", "answer": "-10", "answer_words": "minus ten", "difficulty": "level_1"} {"instruction": "Find the solution to -2y + 9 = -3 - -4.", "input": "-2y + 9 = -3 - -4", "answer": "4", "answer_words": "four", "difficulty": "level_1"} ``` ## Splits Recommended split structure: - `train` - `validation` - `test` If your repository currently uses a single file, this card can still be published as-is and updated once explicit split files are added. ## Dataset Creation ### Source Data This dataset appears to be synthetically generated or programmatically constructed from equation templates. The examples are highly regular in structure and use normalized field formatting suitable for automated generation pipelines. ### Curation Rationale The goal is to provide a clean, machine-readable corpus for algebra instruction tuning and evaluation. The paired `answer` and `answer_words` fields support experiments in answer formatting, verbalization, and robust decoding. ## Intended Uses ### Direct Use - Fine-tuning instruction-following models on algebra tasks - Benchmarking symbolic accuracy on simple equation solving - Filtering by difficulty for staged training - Comparing numeric and verbalized answer generation ### Out-of-Scope Use This dataset should not be treated as a comprehensive benchmark for advanced mathematics. It appears focused on narrow algebraic patterns and short-answer response formats. ## Limitations - Likely synthetic rather than naturally occurring educational data - Limited task diversity in the current sample - Difficulty labels may reflect generation rules rather than human judgment - Small answer space may inflate performance for some model classes - Does not capture full reasoning traces unless chain-of-thought fields are added separately ## Bias, Risks, and Safety This dataset is low risk compared with open-domain corpora, but users should still be aware of the following: - Synthetic task distributions may not match real student errors or natural math phrasing - Models trained on templated equations may overfit formatting patterns - Strong benchmark performance on this dataset may not transfer to broader mathematical reasoning ## Recommended Evaluation Useful metrics include: - exact match on `answer` - normalized exact match after whitespace and sign cleanup - accuracy by `difficulty` - agreement between `answer` and generated verbalized answer ## Training Example ```python from datasets import load_dataset # Local JSONL files # dataset = load_dataset("json", data_files={ # "train": "data/train.jsonl", # "validation": "data/validation.jsonl", # "test": "data/test.jsonl", # }) # Hugging Face Hub # dataset = load_dataset("AtlasUnified/atlas-math-sets") ``` ## Prompting Example ```python example = { "instruction": "Solve the multi-step equation 2x + -3 = 14 - -1.", "input": "2x + -3 = 14 - -1", "answer": "9", "answer_words": "nine", "difficulty": "level_1" } prompt = f"Instruction: {example['instruction']}\nInput: {example['input']}\nAnswer:" print(prompt) ``` ## Suggested Repository Layout ```text atlas-math-sets/ ├── README.md ├── data/ │ ├── train.jsonl │ ├── validation.jsonl │ └── test.jsonl └── LICENSE ``` ## Citation If you use this dataset, cite the repository or dataset page associated with Atlas Math Sets. If you want a formal BibTeX citation, add it here once publication metadata is finalized. ```bibtex @dataset{atlas_math_sets, title = {Atlas Math Sets}, author = {AtlasUnified}, year = {2026}, note = {Hugging Face dataset} } ``` ## License MIT
提供机构:
AtlasUnified
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作