Exploring Human-Like Mathematical Reasoning: Perspectives on Generalizability and Efficiency

Name: Exploring Human-Like Mathematical Reasoning: Perspectives on Generalizability and Efficiency
Creator: University of Notre Dame
Published: 2024-12-03 15:59:34
License: 暂无描述

DataCite Commons2024-12-03 更新2025-04-17 收录

下载链接：

https://curate.nd.edu/articles/dataset/Exploring_Human-Like_Mathematical_Reasoning_Perspectives_on_Generalizability_and_Efficiency/27895872

下载链接

链接失效反馈

官方服务：

资源简介：

Mathematical reasoning, a fundamental aspect of human cognition, poses significant challenges for artificial intelligence (AI) systems. Despite recent advancements in natural language processing (NLP) and large language models (LLMs), AI's ability to replicate human-like reasoning, generalization, and efficiency remains an ongoing research challenge. In this dissertation, we address key limitations in MWP solving, focusing on the accuracy, generalization ability and efficiency of AI-based mathematical reasoners by applying human-like reasoning methods and principles. This dissertation introduces several innovative approaches in mathematical reasoning. First, a numeracy-driven framework is proposed to enhance math word problem (MWP) solvers by integrating numerical reasoning into model training, surpassing human-level performance on benchmark datasets. Second, a novel multi-solution framework captures the diversity of valid solutions to math problems, improving the generalization capabilities of AI models. Third, a customized knowledge distillation technique, termed Customized Exercise for Math Learning (CEMAL), is developed to create tailored exercises for smaller models, significantly improving their efficiency and accuracy in solving MWPs. Additionally, a multi-view fine-tuning paradigm (MinT) is introduced to enable smaller models to handle diverse annotation styles from different datasets, improving their adaptability and generalization. To further advance mathematical reasoning, a benchmark, MathChat, is introduced to evaluate large language models (LLMs) in multi-turn reasoning and instruction-following tasks, demonstrating significant performance improvements. Finally, new inference-time verifiers, Math-Rev and Code-Rev, are developed to enhance reasoning verification, combining language-based and code-based solutions for improved accuracy in both math and code reasoning tasks. In summary, this dissertation provides a comprehensive exploration of these challenges and contributes novel solutions that push the boundaries of AI-driven mathematical reasoning. Potential future research directions are also discussed to further extend the impact of this dissertation.

数学推理作为人类认知的核心组成部分，对人工智能（AI）系统而言是一项极具挑战性的任务。尽管自然语言处理（NLP）与大语言模型（LLMs）领域近年取得了诸多进展，但人工智能复刻人类式推理、泛化与高效性的能力，仍是当前亟待攻克的研究难题。在本论文中，我们针对数学应用题（MWP）求解领域的关键局限展开研究，通过借鉴人类推理的方法与原则，聚焦于提升基于人工智能的数学推理器的准确性、泛化能力与运行效率。本论文针对数学推理领域提出了多项创新性研究方法。其一，提出了一种以数值计算能力为驱动的框架，通过将数值推理融入模型训练流程以优化数学应用题求解器，在基准数据集上实现了超越人类水平的性能表现。其二，提出了一种全新的多解法框架，可捕捉数学问题有效解法的多样性，进而提升人工智能模型的泛化能力。其三，开发了一种定制化知识蒸馏技术，命名为「数学学习定制习题（Customized Exercise for Math Learning, CEMAL）」，可为小型模型量身定制训练习题，显著提升其求解数学应用题的效率与准确性。此外，提出了一种多视角微调范式（MinT），可使小型模型适配不同数据集的多样化标注风格，从而提升其适应性与泛化能力。为进一步推动数学推理领域的发展，本研究构建了MathChat基准测试集，用于评估大语言模型（LLMs）在多轮推理与指令遵循任务中的表现，实验结果证明该基准可有效展现模型性能的显著提升。最后，本研究开发了两款推理阶段验证器Math-Rev与Code-Rev，通过融合基于自然语言与基于代码的解法，强化推理验证环节，从而在数学推理与代码推理任务中均实现了准确率的提升。综上，本论文对上述挑战展开了全面探索，并提出了多项创新性解法，突破了人工智能驱动的数学推理领域的现有边界。此外，本论文还探讨了潜在的未来研究方向，以进一步拓展本研究的影响力。

提供机构：

University of Notre Dame

创建时间：

2024-11-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集