five

anli

收藏
魔搭社区2025-12-05 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/anli
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for "anli" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** [https://github.com/facebookresearch/anli/](https://github.com/facebookresearch/anli/) - **Paper:** [Adversarial NLI: A New Benchmark for Natural Language Understanding](https://arxiv.org/abs/1910.14599) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Size of downloaded dataset files:** 18.62 MB - **Size of the generated dataset:** 77.12 MB - **Total amount of disk used:** 95.75 MB ### Dataset Summary The Adversarial Natural Language Inference (ANLI) is a new large-scale NLI benchmark dataset, The dataset is collected via an iterative, adversarial human-and-model-in-the-loop procedure. ANLI is much more difficult than its predecessors including SNLI and MNLI. It contains three rounds. Each round has train/dev/test splits. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages English ## Dataset Structure ### Data Instances #### plain_text - **Size of downloaded dataset files:** 18.62 MB - **Size of the generated dataset:** 77.12 MB - **Total amount of disk used:** 95.75 MB An example of 'train_r2' looks as follows. ``` This example was too long and was cropped: { "hypothesis": "Idris Sultan was born in the first month of the year preceding 1994.", "label": 0, "premise": "\"Idris Sultan (born January 1993) is a Tanzanian Actor and comedian, actor and radio host who won the Big Brother Africa-Hotshot...", "reason": "", "uid": "ed5c37ab-77c5-4dbc-ba75-8fd617b19712" } ``` ### Data Fields The data fields are the same among all splits. #### plain_text - `uid`: a `string` feature. - `premise`: a `string` feature. - `hypothesis`: a `string` feature. - `label`: a classification label, with possible values including `entailment` (0), `neutral` (1), `contradiction` (2). - `reason`: a `string` feature. ### Data Splits | name |train_r1|dev_r1|train_r2|dev_r2|train_r3|dev_r3|test_r1|test_r2|test_r3| |----------|-------:|-----:|-------:|-----:|-------:|-----:|------:|------:|------:| |plain_text| 16946| 1000| 45460| 1000| 100459| 1200| 1000| 1000| 1200| ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information [cc-4 Attribution-NonCommercial](https://github.com/facebookresearch/anli/blob/main/LICENSE) ### Citation Information ``` @InProceedings{nie2019adversarial, title={Adversarial NLI: A New Benchmark for Natural Language Understanding}, author={Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe}, booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", year = "2020", publisher = "Association for Computational Linguistics", } ``` ### Contributions Thanks to [@thomwolf](https://github.com/thomwolf), [@easonnie](https://github.com/easonnie), [@lhoestq](https://github.com/lhoestq), [@patrickvonplaten](https://github.com/patrickvonplaten) for adding this dataset.

# 「对抗式自然语言推理(Adversarial Natural Language Inference,简称ANLI)」数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [采集依据](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献致谢](#contributions) ## 数据集描述 - **主页:** - **代码仓库:** [https://github.com/facebookresearch/anli/](https://github.com/facebookresearch/anli/) - **论文:** [《对抗式自然语言推理:自然语言理解的全新基准》](https://arxiv.org/abs/1910.14599) - **联系方式:** [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **下载数据集文件大小:** 18.62 MB - **生成后数据集大小:** 77.12 MB - **总磁盘占用:** 95.75 MB ### 数据集摘要 对抗式自然语言推理(Adversarial Natural Language Inference,简称ANLI)是一款全新的大规模自然语言推理(Natural Language Inference,简称NLI)基准数据集。该数据集通过迭代式的人机协同对抗采集流程构建完成。ANLI的难度远高于其前代数据集,包括斯坦福自然语言推理数据集(Stanford Natural Language Inference,简称SNLI)和多领域自然语言推理数据集(Multi-Genre Natural Language Inference,简称MNLI)。该数据集包含三个轮次,每个轮次均设有训练集、开发集与测试集划分。 ### 支持任务与排行榜 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 语言 英语 ## 数据集结构 ### 数据实例 #### 纯文本格式 - **下载数据集文件大小:** 18.62 MB - **生成后数据集大小:** 77.12 MB - **总磁盘占用:** 95.75 MB `train_r2`的一个示例如下所示: 该示例过长,已被截断: { "hypothesis": "Idris Sultan was born in the first month of the year preceding 1994.", "label": 0, "premise": ""Idris Sultan (born January 1993) is a Tanzanian Actor and comedian, actor and radio host who won the Big Brother Africa-Hotshot...", "reason": "", "uid": "ed5c37ab-77c5-4dbc-ba75-8fd617b19712" } ### 数据字段 所有划分的数据字段格式保持一致。 #### 纯文本格式 - `uid`:字符串类型特征。 - `premise`:字符串类型特征。 - `hypothesis`:字符串类型特征。 - `label`:分类标签,可选值包括`entailment`(蕴含,0)、`neutral`(中立,1)、`contradiction`(矛盾,2)。 - `reason`:字符串类型特征。 ### 数据划分 | 名称 | train_r1 | dev_r1 | train_r2 | dev_r2 | train_r3 | dev_r3 | test_r1 | test_r2 | test_r3 | |----------|---------:|-------:|---------:|-------:|---------:|-------:|--------:|--------:|--------:| | 纯文本格式| 16946| 1000| 45460| 1000| 100459| 1200| 1000| 1000| 1200| ## 数据集构建 ### 采集依据 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 源数据 #### 初始数据采集与标准化 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 源语言生成者是谁? [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 标注信息 #### 标注流程 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 标注人员是谁? [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 个人与敏感信息 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 偏差讨论 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 其他已知局限性 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 附加信息 ### 数据集维护者 [需补充更多信息](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 许可信息 [知识共享署名-非商业性使用许可(CC BY-NC 4.0)](https://github.com/facebookresearch/anli/blob/main/LICENSE) ### 引用信息 @InProceedings{nie2019adversarial, title={Adversarial NLI: A New Benchmark for Natural Language Understanding}, author={Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe}, booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", year = "2020", publisher = "Association for Computational Linguistics", } ### 贡献致谢 感谢[@thomwolf](https://github.com/thomwolf)、[@easonnie](https://github.com/easonnie)、[@lhoestq](https://github.com/lhoestq)、[@patrickvonplaten](https://github.com/patrickvonplaten) 为本数据集的添加提供支持。
提供机构:
maas
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作