five

aloks16/mathreasoning

收藏
Hugging Face2026-01-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aloks16/mathreasoning
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: MathReasoning language: - en license: apache-2.0 tags: - mathematics - reasoning - education - synthetic - chain-of-thought - problem-solving - NCERT - competitive-exams - advanced maths task_categories: - text-generation - question-answering size_categories: - 1M<n<10M paperswithcode_id: null dataset_info: features: - name: prompt dtype: string - name: response dtype: string splits: - name: train num_examples: 3129768 --- # MathReasoning The **MathReasoning Dataset** is a large-scale, high-quality dataset (~3.13M rows) focused on mathematics, logical reasoning, and problem-solving. It is primarily generated through synthetic distillation techniques, complemented by curated open-source educational content. The dataset is designed to train and evaluate language models in mathematical reasoning, quantitative problem-solving, and structured chain-of-thought tasks across domains from basic arithmetic to advanced mathematics. ## Summary • **Domain**: Mathematics, Reasoning, Education • **Scale**: ~3.13M rows, comprehensive mathematical coverage • **Source**: Generated using distillation techniques with open-source educational content • **Goal**: Train and evaluate reasoning-capable LLMs for mathematical problem-solving • **Languages**: English (academic/mathematical focus) • **License**: Apache 2.0 ## Examples Here are some representative examples from the dataset: ```json { "prompt": "Solve: 2x + 5 = 15. Find x.", "response": "Step 1: Subtract 5 from both sides → 2x = 10. Step 2: Divide both sides by 2 → x = 5." } { "prompt": "Evaluate the integral ∫ (x^2) dx.", "response": "The antiderivative of x² is (x³/3) + C. Final Answer: (x³/3) + C." } { "prompt": "Five friends sit in a row. A sits to the left of B, but not next to C. If D is at one end and E is next to B, find the seating arrangement.", "response": "Through elimination and logical reasoning, the arrangement is: D - A - C - B - E." } ``` ## Loading & Usage ```python from datasets import load_dataset dataset = load_dataset("169Pi/mathreasoning") print(dataset["train"][0]) ``` ## Data Preview | prompt | response | |--------|----------| | In the study of topology, determine the conditions under which the following statements hold true: 1. The real line $\Bbb R$ is homeomorphic to $\Bbb R^n$. 2. The unit circle $\Bbb S^1$ is homeomorphic to $\Bbb S^n$. Provide a clear proof for each case. | To address the first statement, suppose there exists a homeomorphism $f: \Bbb R \to \Bbb R^n$. Let $A = (-\infty, 0) \cup (0, \infty)$. Observe that $\Bbb R^n \setminus \{f(0)\}$ is connected for $n > 1$ because removing a single point from an $n$-dimensional Euclidean space with $n > 1$ does not disconnect it... | | Explain why the expectation of a random variable X can be less than a real number 'a' even if X is always less than 'a'. | Even though X is always less than 'a', there may be a small probability that X takes on values much smaller than 'a'. This can lead to the expectation of X being less than 'a', as the small probability of extreme negative values can have a significant impact on the overall average. | ## Dataset Structure | Field | Type | Description | |-------|------|-------------| | prompt | string | Math problem or reasoning question | | response | string | Step-by-step solution and final answer | ## Splits & Statistics • **Train**: 3,129,768 rows ## Mathematical Coverage ### 1. Arithmetic & Number Systems - Addition, subtraction, multiplication, division - Fractions, decimals, ratios, percentages - Number theory basics (LCM, HCF, modular arithmetic) ### 2. Algebra - Linear & quadratic equations - Inequalities, polynomials, factorisation - Progressions (AP, GP, HP) ### 3. Geometry & Mensuration - Euclidean geometry, triangles, circles - Coordinate geometry - Area, volume, perimeter, trigonometry basics ### 4. Advanced Mathematics - Calculus (limits, differentiation, integration basics) - Probability & statistics - Vectors and matrices ### 5. Logical Reasoning & Puzzles - Deductive reasoning, analogies, pattern recognition - Word problems, seating arrangements, mathematical puzzles - Olympiad-style logical challenges ## Question Types - **Numerical Problems:** direct calculations and answers - **Step-by-Step Derivations:** chain-of-thought explanations - **Proof-Style Explanations:** logical/mathematical proofs - **Word Problems & Puzzles:** reasoning-heavy, multi-step logic ## Use Cases • **Model Fine-Tuning for Mathematics** → Enhancing math-solving capabilities of LLMs • **Reasoning Skill Development** → Training models for step-by-step problem solving (Chain-of-Thought) • **STEM Education** → Assisting students and teachers with structured math Q&A • **Exam Preparation** → Practice resource for school exams, competitive tests (JEE, Olympiad) • **EdTech Applications** → Creating intelligent math tutors for adaptive learning • **Automated Solution Generation** → Step-wise solutions for educational platforms • **Evaluation & Benchmarking** → Testing LLM performance in mathematical reasoning • **Cognitive Research** → Studying AI quantitative reasoning and symbolic problem-solving ## Dataset Creation Process 1. **Synthetic Generation (majority)** → Distilled math Q&A aligned with NCERT syllabus and competitive exam styles 2. **Open-Source Material (minority)** → Curated from freely available educational question banks 3. **Validation Pipeline** → Step-by-step validation for accuracy, logical flow, and clarity 4. **Quality Control** → Removed duplicates, ensured correctness of solutions ## Key Features • **Mathematics-Focused Coverage** → Basic arithmetic to advanced problem-solving • **Domain Richness** → School curriculum (NCERT Class 6–12), competitive exams (JEE, Olympiad) • **Reasoning-Centric** → Designed for structured step-by-step mathematical reasoning • **Hybrid Source** → Primarily synthetic (distilled), supplemented by open-source content • **Cleaned & Filtered** → Comprehensive quality assurance and validation ## Impact MathReasoning is among the largest open-source mathematics reasoning datasets (~3.1M Q&A pairs), making it a flagship resource for advancing open-source AI in mathematical reasoning, education, and benchmarking. ## License This dataset is released under the **Apache 2.0 License**, allowing free use, modification, and distribution for research and commercial purposes. ## Limitations & Ethical Considerations • **Synthetic Data**: Primarily generated through distillation techniques, may contain model-specific biases • **Educational Focus**: Designed for learning and evaluation, not to replace human mathematical instruction • **Quality Assurance**: While extensively validated, users should verify critical applications independently • **Domain Scope**: Focused on school-level and competitive exam mathematics • **Bias Mitigation**: Content is mathematical/logical in nature, minimising subjective biases ## Citation ```bibtex @misc{169pi2025mathreasoning, title = {mathreasoning}, author = {169Pi AI Team}, year = {2025}, howpublished = {\url{https://huggingface.co/datasets/169Pi/mathreasoning}}, } ``` ## About 169Pi We are an emerging company building the AI ecosystem, like the Alpie-core suite of models, datasets and more. Our mission is to advance open-source AI research by releasing large-scale, high-quality reasoning datasets across multiple domains (mathematics, education, medical, and more). The MathReasoning Dataset represents our commitment to enhancing mathematical reasoning capabilities in AI systems. ## Community & Contributions • **Issues & Discussions**: Open issues or start discussions on the HuggingFace dataset page • **Contributions**: Pull requests welcome for error reporting, quality improvements, and dataset extensions • **Fine-tuning Results**: Share your model results and improvements with the community • **Collaboration**: Researchers, developers and organisations are encouraged to contribute --- *The MathReasoning Dataset represents a significant contribution to mathematical AI research, education, and reasoning-focused model development.*
提供机构:
aloks16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作