five

jon-tow/feasibility_qa

收藏
Hugging Face2023-12-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jon-tow/feasibility_qa
下载链接
链接失效反馈
官方服务:
资源简介:
FeasibilityQA是一个包含需要理解可行性问题的数据集。该数据集由两种类型的问题组成:二元分类问题(BCQ)和多选多正确问题(MCQ)。在BCQ中,任务是根据上下文判断问题是否可行;在MCQ中,任务是从给定问题中选择所有可行的答案。

FeasibilityQA是一个包含需要理解可行性问题的数据集。该数据集由两种类型的问题组成:二元分类问题(BCQ)和多选多正确问题(MCQ)。在BCQ中,任务是根据上下文判断问题是否可行;在MCQ中,任务是从给定问题中选择所有可行的答案。
提供机构:
jon-tow
原始信息汇总

feasibility_qa

数据集详情

数据集描述

FeasibilityQA 是一个包含需要理解可行性问题的数据集。该数据集包含两种类型的问题:二元分类(BCQ)和多选多正确问题(MCQ)。在 BCQ 中,任务是判断在给定上下文的情况下问题是否可行;在 MCQ 中,任务是选择给定问题的所有可行答案。

数据集配置

  • BCQ 配置:数据文件为 FeasibilityQA_dataset_BCQ.csv
  • MCQ 配置:数据文件为 FeasibilityQA_dataset_MCQ.csv

数据集来源

  • 仓库:https://github.com/kevinscaria/feasibilityQA
  • 论文:"John is 50 years old, can his son be 65?" Evaluating NLP Models Understanding of Feasibility. (Gupta et al., 2022)

引用

bibtex @inproceedings{gupta-etal-2023-john, title = "{``}John is 50 years old, can his son be 65?{} Evaluating {NLP} Models{} Understanding of Feasibility", author = "Gupta, Himanshu and Varshney, Neeraj and Mishra, Swaroop and Pal, Kuntal Kumar and Sawant, Saurabh Arjun and Scaria, Kevin and Goyal, Siddharth and Baral, Chitta", editor = "Vlachos, Andreas and Augenstein, Isabelle", booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.eacl-main.30", doi = "10.18653/v1/2023.eacl-main.30", pages = "407--417", abstract = "In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-answering dataset involving binary classification (BCQ) and multi-choice multi-correct questions (MCQ) that test understanding of feasibility. We show that even state-of-the-art models such as GPT-3, GPT-2, and T5 struggle to answer the feasibility questions correctly. Specifically, on (MCQ, BCQ) questions, GPT-3 achieves accuracy of just (19{%}, 62{%}) and (25{%}, 64{%}) in zero-shot and few-shot settings, respectively. We also evaluate models by providing relevant knowledge statements required to answer the question and find that the additional knowledge leads to a 7{%} gain in performance, but the overall performance still remains low. These results make one wonder how much commonsense knowledge about action feasibility is encoded in state-of-the-art models and how well they can reason about it.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作