stanfordnlp/squad_adversarial

Name: stanfordnlp/squad_adversarial
Creator: stanfordnlp
Published: 2024-01-18 11:16:12
License: 暂无描述

Hugging Face2024-01-18 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/stanfordnlp/squad_adversarial

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为Adversarial Examples for SQuAD，主要用于评估阅读理解系统的对抗性能力。数据集基于SQuAD开发集，并添加了对抗性句子，以测试系统在存在干扰句子时的表现。数据集包含两个配置：AddSent和AddOneSent，分别包含3560和1787个问答对。数据集的字段包括id、title、context、question和answers，格式与SQuAD相同。数据集的语言为英语，许可证为MIT。

提供机构：

stanfordnlp

原始信息汇总

数据集概述

名称: Adversarial Examples for SQuAD
语言: 英语
许可证: MIT
多语言性: 单语种
大小: 1K<n<10K
来源数据集: 扩展自SQuAD
任务类别: 问答
任务ID: extractive-qa
数据集配置:

squad_adversarial
AddSent
AddOneSent

数据集结构

数据实例

py {answers: {answer_start: [334, 334, 334], text: [February 7, 2016, February 7, February 7, 2016]}, context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levis Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50. The Champ Bowl was played on August 18th,1991., id: 56bea9923aeaaa14008c91bb-high-conf-turk2, question: What day was the Super Bowl played on?, title: Super_Bowl_50}

数据字段

py {id: Value(dtype=string, id=None), title: Value(dtype=string, id=None), context: Value(dtype=string, id=None), question: Value(dtype=string, id=None), answers: Sequence(feature={text: Value(dtype=string, id=None), answer_start: Value(dtype=int32, id=None)}, length=-1, id=None) }

数据分割

AddSent: 3560个例子，总字节数3803551。
AddOneSent: 1787个例子，总字节数1864767。

数据集创建

来源数据

原始数据: SQuAD dev set
处理方式: 添加对抗性句子

许可证信息

许可证: MIT License

引用信息

@inproceedings{jia-liang-2017-adversarial, title = "Adversarial Examples for Evaluating Reading Comprehension Systems", author = "Jia, Robin and Liang, Percy", booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D17-1215", doi = "10.18653/v1/D17-1215", pages = "2021--2031", abstract = "Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer or misleading humans. In this adversarial setting, the accuracy of sixteen published models drops from an average of 75% F1 score to 36%; when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to 7%. We hope our insights will motivate the development of new models that understand language more precisely.", }

搜集汇总

数据集介绍

构建方式

在机器阅读理解领域，评估模型对语言的深层理解能力至关重要。该数据集基于斯坦福问答数据集（SQuAD）开发集构建，通过自动生成对抗性句子并插入原文中，形成新的上下文。这些句子旨在干扰计算机系统，但不会改变正确答案或误导人类标注者。构建过程涉及从原始SQuAD数据中选取上下文，并利用算法生成与问题词汇重叠却无答案信息的干扰句，最终形成包含AddSent和AddOneSent两种配置的对抗样本集合。

特点

该数据集的核心特征在于其对抗性设计，专门用于检验机器阅读理解模型的鲁棒性。它包含两种配置：AddSent配置中每个上下文最多插入五个候选对抗句，而AddOneSent配置则随机选取一个对抗句。这些对抗句与问题共享大量词汇，但语义上不提供答案，从而挑战模型区分相关与干扰信息的能力。数据集规模适中，共涵盖数千个问答对，保持了与原始SQuAD一致的数据结构，便于直接集成到现有评估流程中。

使用方法

在自然语言处理研究中，该数据集主要用于对抗性评估，以揭示模型在复杂语言环境下的局限性。研究人员可将其作为验证集，测试已训练问答模型在对抗干扰下的性能表现。使用前需加载数据集配置，如AddSent或AddOneSent，并按照标准SQuAD格式处理上下文、问题和答案字段。通过比较模型在原始SQuAD和该对抗数据集上的准确率差异，可以深入分析模型的理解缺陷，进而推动更具鲁棒性的算法创新。

背景与挑战

背景概述

在自然语言处理领域，机器阅读理解系统的评估长期依赖标准准确率指标，然而这些指标难以全面衡量模型对语言的深层理解能力。为此，斯坦福大学的研究人员Robin Jia与Percy Liang于2017年共同创建了SQuAD对抗性数据集，作为对经典斯坦福问答数据集（SQuAD）的扩展。该数据集的核心研究问题在于探究现有阅读理解模型是否真正具备语言理解能力，而非仅仅依赖表面模式匹配。通过在原始文本中自动插入对抗性句子，这些句子旨在干扰计算机系统却不影响人类判断，从而构建了一个更具挑战性的评估环境。这一创新性工作显著推动了对抗性评估方法在自然语言处理中的应用，促使研究者开发更具鲁棒性的模型，对提升机器理解语言的精确性产生了深远影响。

当前挑战

该数据集旨在解决机器阅读理解领域模型鲁棒性评估的挑战，传统模型在标准测试集上表现优异，但在面对精心设计的对抗性干扰时，其性能往往大幅下降，揭示了模型依赖浅层语言特征而非深层语义理解的局限性。在构建过程中，主要挑战在于如何自动生成既不影响人类正确判断，又能有效误导计算机系统的对抗性句子。这要求生成的句子与问题共享大量词汇但语义无关，且需保持文本的流畅性与自然度，避免引入语法错误或明显不合理内容，以确保评估的公平性与针对性。

常用场景

经典使用场景

在机器阅读理解领域，SQuAD对抗性数据集被广泛用于评估模型对语言深层语义的理解能力。该数据集通过在原始文本中插入对抗性句子，这些句子与问题共享词汇但语义无关，旨在测试模型是否能够抵御干扰并准确提取答案。经典使用场景包括训练和验证问答系统的鲁棒性，特别是在面对精心设计的语义干扰时，模型能否保持稳定的性能表现。

实际应用

在实际应用中，SQuAD对抗性数据集可用于增强智能客服、搜索引擎和文档分析系统的问答模块。通过利用该数据集进行模型优化，系统能够更好地处理用户输入中的冗余或误导性信息，提高在复杂真实场景下的准确性和稳定性。例如，在法律或医疗领域的自动问答系统中，对抗性训练有助于减少因文本干扰导致的错误回答，提升服务的专业性和可信度。

衍生相关工作

该数据集衍生了一系列经典研究工作，包括对抗性训练方法的创新、模型鲁棒性评估框架的构建以及可解释性分析技术的探索。例如，基于此数据集的对抗性攻击策略被广泛应用于测试BERT、GPT等预训练模型的弱点，进而催生了如对抗性数据增强、动态对抗训练等防御机制。这些工作共同推动了自然语言处理领域向更稳健、更可解释的方向发展。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集