NumGLUE
收藏arXiv2025-09-30 收录
下载链接:
https://allenai.org/data/numglue
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为NumGLUE,是一个多任务基准评测,旨在评估人工智能系统在八项不同任务上的表现,这些任务都需要对简单的算术理解能力。NumGLUE倡导在任务间分享知识,并突显当前人工智能系统在数学推理方面的局限性。其核心任务是数学推理和算术理解。
The dataset named NumGLUE is a multi-task benchmark designed to evaluate the performance of AI systems across eight distinct tasks, all of which require simple arithmetic comprehension. NumGLUE advocates for knowledge sharing across tasks and highlights the limitations of current artificial intelligence systems in mathematical reasoning. Its core tasks focus on mathematical reasoning and arithmetic comprehension.



