NumGLUE Dataset
收藏paperswithcode.com2025-03-22 收录
下载链接:
https://paperswithcode.com/dataset/numglue
下载链接
链接失效反馈官方服务:
资源简介:
The NumGLUE dataset is a valuable resource developed by the Allen Institute for AI. It focuses on evaluating the performance of AI systems in mathematical reasoning tasks that involve numbers within natural language text. Here are the key details about NumGLUE:
Purpose and Inspiration:
Drawing inspiration from the GLUE benchmark, which was designed for natural language understanding, NumGLUE aims to assess AI systems' ability to reason with numbers.
Unlike GLUE, which covers a wide range of NLP tasks, NumGLUE specifically targets tasks that require simple arithmetic understanding.
Tasks:
NumGLUE consists of eight different tasks, each involving numerical reasoning:
Commonsense + Arithmetic Reasoning
Domain Specific + Arithmetic Reasoning
Commonsense + Quantitative Comparison
Fill-in-the-blanks Format
Reading Comprehension (RC) + Explicit Numerical Reasoning
Reading Comprehension (RC) + Implicit Numerical Reasoning
Challenges and Performance:
Despite the availability of neural models, including state-of-the-art large-scale language models, NumGLUE remains unsolved.
These models perform significantly worse than humans, with an average gap of 46.4%.
The dataset encourages knowledge sharing across tasks, especially for those with limited training data. Joint training on all tasks yields superior performance.
Importance:
NumGLUE promotes the development of systems capable of robust and general arithmetic reasoning within language.
It serves as a stepping stone toward more complex mathematical reasoning.
(1) NumGLUE Dataset — Allen Institute for AI. https://allenai.org/data/numglue.
(2) GitHub - allenai/numglue: NumGLUE: A Suite of Fundamental yet .... https://github.com/allenai/numglue.
(3) nyu-mll/glue · Datasets at Hugging Face. https://huggingface.co/datasets/nyu-mll/glue.
NumGLUE 数据集是由艾伦人工智能研究所开发的一项宝贵资源。该数据集聚焦于评估人工智能系统在自然语言文本中涉及数字的数学推理任务中的性能。以下是关于 NumGLUE 的关键细节:
目的与灵感:
NumGLUE 旨在评估人工智能系统进行数字推理的能力,其灵感来源于专为自然语言理解设计的 GLUE 基准测试。
与涵盖广泛自然语言处理任务的 GLUE 不同,NumGLUE 专门针对需要简单算术理解的特定任务。
任务:
NumGLUE 包含八个不同的任务,每个任务都涉及数值推理:
- 常识与算术推理
- 领域特定与算术推理
- 常识与量化比较
- 填空格式
- 阅读理解(RC)与显式数值推理
- 阅读理解(RC)与隐式数值推理
挑战与性能:
尽管存在包括最先进的大规模语言模型在内的神经网络模型,NumGLUE 仍然是一个未解之谜。
这些模型的表现显著劣于人类,平均差距达到 46.4%。
该数据集鼓励跨任务的知识共享,尤其是对于训练数据有限的任务。对所有任务进行联合训练能够带来更优异的性能。
重要性:
NumGLUE 推动了能够在语言环境中进行稳健且通用的算术推理系统的开发。
它为更复杂的数学推理提供了坚实的基石。
(1)NumGLUE 数据集 — 艾伦人工智能研究所。https://allenai.org/data/numglue。
(2)GitHub - allenai/numglue: NumGLUE: A Suite of Fundamental yet .... https://github.com/allenai/numglue。
(3)nyu-mll/glue · Datasets at Hugging Face。https://huggingface.co/datasets/nyu-mll/glue。” }
提供机构:
Papers with Code



