Adversarial NLI (ANLI)|自然语言处理数据集|对抗性训练数据集
收藏Adversarial NLI 数据集概述
数据集版本
- 版本 1.0 可在此处获取:https://dl.fbaipublicfiles.com/anli/anli_v1.0.zip
数据格式
- 数据文件为 JSONL 格式(每行一个 JSON 对象)。
- 每个示例包含以下字段:
uid:唯一标识符。premise:前提。hypothesis:假设。label:标签。reason:解释标签的原因。
示例
json { "uid": "8a91e1a2-9a32-4fd9-b1b6-bd2ee2287c8f", "premise": "Javier Torres (born May 14, 1988 in Artesia, California) is an undefeated Mexican American professional boxer in the Heavyweight division. Torres was the second rated U.S. amateur boxer in the Super Heavyweight division and a member of the Mexican Olympic team.", "hypothesis": "Javier was born in Mexico", "label": "c", "reason": "The paragraph states that Javier was born in the California, US." }
字段说明
reason字段:在dev和test数据集中,每个示例都包含reason字段,部分train数据集示例也包含此字段。该字段由标注者提供,解释为何该陈述属于特定类别以及为何对系统来说较难判断。
验证标签
dev和test数据集中的所有示例都由 2 或 3 名验证者(如果前两名验证者意见不一致)进行验证。- 额外的验证标签可在
verifier_labels/verifier_labels_R1-3.jsonl获取。
错误分析
- 数据集的深入错误分析可在此处获取:https://github.com/facebookresearch/anli/tree/main/anlizinganli
- 使用细粒度标注方案,对推理的不同方面进行标注,以解释黄金分类标签。
引用
数据集
@inproceedings{nie-etal-2020-adversarial, title = "Adversarial {NLI}: A New Benchmark for Natural Language Understanding", author = "Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe", booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", year = "2020", publisher = "Association for Computational Linguistics", }
错误分析标注
@article{williams-etal-2020-anlizing, title = "ANLIzing the Adversarial Natural Language Inference Dataset", author = "Adina Williams and Tristan Thrush and Douwe Kiela", booktitle = "Proceedings of the 5th Annual Meeting of the Society for Computation in Linguistics", year = "2022", publisher = "Association for Computational Linguistics", }

- 1Adversarial NLI: A New Benchmark for Natural Language UnderstandingFacebook AI Research · 2020年
