UTMath/UTMath
收藏Hugging Face2024-11-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/UTMath/UTMath
下载链接
链接失效反馈官方服务:
资源简介:
UTMath是一个用于评估大型语言模型(LLMs)数学推理能力的严格且广泛的基准。它通过单元测试的方式,确保模型真正解决问题,而不是简单地记忆答案。数据集包含1053个问题,涵盖9个数学领域,每个问题包含超过68个测试用例。UTMath鼓励使用Reasoning-to-Coding of Thoughts (RCoT)方法,该方法要求LLMs在生成代码之前进行显式推理,从而提高解决方案的效率和有效性。数据集还提供了评估模型在UTMath上的表现的方法,并展示了不同模型在不同数学领域的表现。
UTMath is a rigorous and expansive benchmark designed to evaluate the mathematical reasoning abilities of Large Language Models (LLMs). It ensures that the model truly solves the problem rather than simply memorizing the answers through unit testing. The dataset contains 1053 problems covering 9 mathematical domains, with each problem including over 68 test cases. UTMath encourages the use of the Reasoning-to-Coding of Thoughts (RCoT) approach, which requires LLMs to engage in explicit reasoning before generating code, thereby improving the efficiency and effectiveness of the solution. The dataset also provides methods for evaluating model performance on UTMath and showcases the performance of different models across various mathematical domains.
提供机构:
UTMath



