UTMath/UTMath_Train
收藏Hugging Face2024-11-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/UTMath/UTMath_Train
下载链接
链接失效反馈官方服务:
资源简介:
UTMath是一个严格且广泛的基准测试,旨在通过单元测试和推理到编码思维(RCoT)方法评估大型语言模型(LLMs)的数学推理能力。每个问题平均包含68个测试案例,以确保模型真正解决问题而不仅仅是记忆答案。数据集包含多个案例验证和真实推理评估,要求LLMs输出代码以更好地反映其推理技能。此外,UTMath-Train数据集包含超过70,000个问题解决样本,旨在支持社区进一步推进数学推理研究。
UTMath is a rigorous and expansive benchmark designed to evaluate the mathematical reasoning abilities of Large Language Models (LLMs). It ensures that the model truly solves the problem rather than simply memorizing the answers through unit testing. UTMath emphasizes multiple case validation and true reasoning evaluation, filtering memorization and comparing solution efficiency through hard cases and runtime metrics. Additionally, UTMath introduces the Reasoning-to-Coding of Thoughts (RCoT) approach, encouraging LLMs to engage in explicit reasoning prior to generating code, thereby improving the efficiency and effectiveness of the solution.
提供机构:
UTMath



