nyu-mll/multi_nli
收藏数据集概述
名称: Multi-Genre Natural Language Inference (MultiNLI)
语言: 英语
许可证:
- cc-by-3.0
- cc-by-sa-3.0
- mit
- other
多语言性: 单语
大小: 100K<n<1M
源数据: 原始
任务类别: 文本分类
任务ID:
- natural-language-inference
- multi-input-text-classification
论文代码ID: multinli
美观名称: Multi-Genre Natural Language Inference
数据集结构
数据实例
数据集包含以下字段:
promptID: 整数类型,唯一标识符pairID: 字符串类型,唯一标识符premise: 字符串类型premise_binary_parse: 字符串类型premise_parse: 字符串类型hypothesis: 字符串类型hypothesis_binary_parse: 字符串类型hypothesis_parse: 字符串类型genre: 字符串类型label: 分类标签,包括entailment(0),neutral(1),contradiction(2)
数据分割
- 训练集: 392702个实例
- 验证匹配集: 9815个实例
- 验证不匹配集: 9832个实例
数据集创建
源数据
- 数据收集: 通过从现有文本源选择前提句,并要求人工注释者编写与之配对的新句子作为假设。
许可证详情
- 开放部分: 美国国家语料库的许可证
- 小说部分: 多种许可,包括Creative Commons Share-Alike 3.0 Unported License和Creative Commons Attribution 3.0 Unported Licenses
引用信息
@InProceedings{N18-1101, author = "Williams, Adina and Nangia, Nikita and Bowman, Samuel", title = "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference", booktitle = "Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)", year = "2018", publisher = "Association for Computational Linguistics", pages = "1112--1122", location = "New Orleans, Louisiana", url = "http://aclweb.org/anthology/N18-1101" }




