DISFL-QA
收藏arXiv2021-06-08 更新2024-06-21 收录
下载链接:
https://github.com/google-research-datasets/disfl-qa
下载链接
链接失效反馈官方服务:
资源简介:
DISFL-QA是由谷歌助理创建的一个新型问答数据集,旨在研究自然语言处理中的不流畅性问题。该数据集包含约12000个问题,所有问题均引入了上下文相关的不流畅性,如重复、修正和重启等,这些不流畅性是通过人工在原有流畅问题基础上添加的。DISFL-QA的创建过程涉及从SQUAD-v2数据集中选取段落和问题,并由专家评分员根据上下文添加不流畅性。该数据集主要用于测试和提高自然语言理解模型对不流畅性的鲁棒性,特别是在零样本设置下的性能。
DISFL-QA is a novel question answering (QA) dataset developed by Google Assistant, which is designed to investigate disfluency-related issues in natural language processing (NLP). This dataset contains approximately 12,000 questions, all of which incorporate contextually relevant disfluencies such as repetitions, corrections, and restarts. These disfluencies are manually added to the original fluent questions by human annotators. The construction of DISFL-QA involves selecting paragraphs and questions from the SQUAD-v2 dataset, with expert raters adding disfluencies based on the context of each sample. This dataset is primarily utilized to test and enhance the robustness of natural language understanding (NLU) models against disfluencies, particularly their performance under zero-shot settings.
提供机构:
谷歌助理
创建时间:
2021-06-08



