all-nli

Name: all-nli
Creator: maas
Published: 2025-12-04 16:20:04
License: 暂无描述

魔搭社区2025-12-04 更新2025-01-11 收录

下载链接：

https://modelscope.cn/datasets/sentence-transformers/all-nli

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for AllNLI This dataset is a concatenation of the [SNLI](https://huggingface.co/datasets/stanfordnlp/snli) and [MultiNLI](https://huggingface.co/datasets/nyu-mll/multi_nli) datasets. Despite originally being intended for Natural Language Inference (NLI), this dataset can be used for training/finetuning an embedding model for semantic textual similarity. ## Dataset Subsets ### `pair-class` subset * Columns: "premise", "hypothesis", "label" * Column types: `str`, `str`, `class` with `{"0": "entailment", "1": "neutral", "2", "contradiction"}` * Examples: ```python { 'premise': 'A person on a horse jumps over a broken down airplane.', 'hypothesis': 'A person is training his horse for a competition.', 'label': 1, } ``` * Collection strategy: Reading the premise, hypothesis and integer label from SNLI & MultiNLI datasets. * Deduplified: Yes ### `pair-score` subset * Columns: "sentence1", "sentence2", "score" * Column types: `str`, `str`, `float` * Examples: ```python { 'sentence1': 'A person on a horse jumps over a broken down airplane.', 'sentence2': 'A person is training his horse for a competition.', 'score': 0.5, } ``` * Collection strategy: Taking the `pair-class` subset and remapping "entailment", "neutral" and "contradiction" to 1.0, 0.5 and 0.0, respectively. * Deduplified: Yes ### `pair` subset * Columns: "anchor", "positive" * Column types: `str`, `str` * Examples: ```python { 'anchor': 'A person on a horse jumps over a broken down airplane.', 'positive': 'A person is training his horse for a competition.', } ``` * Collection strategy: Reading the SNLI & MultiNLI datasets and considering the "premise" as the "anchor" and the "hypothesis" as the "positive" if the label is "entailment". The reverse ("entailment" as "anchor" and "premise" as "positive") is not included. * Deduplified: Yes ### `triplet` subset * Columns: "anchor", "positive", "negative" * Column types: `str`, `str`, `str` * Examples: ```python { 'anchor': 'A person on a horse jumps over a broken down airplane.', 'positive': 'A person is outdoors, on a horse.', 'negative': 'A person is at a diner, ordering an omelette.', } ``` * Collection strategy: Reading the SNLI & MultiNLI datasets, for each "premise" making a list of entailing and contradictory sentences using the dataset labels. Then, considering all possible triplets out of these entailing and contradictory lists. The reverse ("entailment" as "anchor" and "premise" as "positive") is not included. * Deduplified: Yes

# AllNLI 数据集卡片本数据集为 [SNLI](https://huggingface.co/datasets/stanfordnlp/snli) 与 [MultiNLI](https://huggingface.co/datasets/nyu-mll/multi_nli) 数据集的拼接合集。尽管最初专为自然语言推理（Natural Language Inference, NLI）任务设计，本数据集亦可用于训练或微调面向语义文本相似度的嵌入模型（embedding model）。 ## 数据集子集 ### `pair-class` 子集 * 字段："premise"、"hypothesis"、"label" * 字段类型：`str`（字符串）、`str`（字符串）、分类标签，标签集合为 `{"0": "entailment（蕴含）", "1": "neutral（中立）", "2": "contradiction（矛盾）"}` * 示例： python { 'premise': 'A person on a horse jumps over a broken down airplane.', 'hypothesis': 'A person is training his horse for a competition.', 'label': 1, } * 采集策略：从 SNLI 与 MultiNLI 数据集中读取前提、假设与整数标签。 * 已去重：是 ### `pair-score` 子集 * 字段："sentence1"、"sentence2"、"score" * 字段类型：`str`（字符串）、`str`（字符串）、`float`（浮点数） * 示例： python { 'sentence1': 'A person on a horse jumps over a broken down airplane.', 'sentence2': 'A person is training his horse for a competition.', 'score': 0.5, } * 采集策略：基于 `pair-class` 子集，将标签 "entailment"、"neutral"、"contradiction" 分别映射为 1.0、0.5 与 0.0。 * 已去重：是 ### `pair` 子集 * 字段："anchor"、"positive" * 字段类型：`str`（字符串）、`str`（字符串） * 示例： python { 'anchor': 'A person on a horse jumps over a broken down airplane.', 'positive': 'A person is training his horse for a competition.', } * 采集策略：读取 SNLI 与 MultiNLI 数据集，当标签为 "entailment" 时，将 "premise" 作为 "anchor"，"hypothesis" 作为 "positive"；不包含反向组合（即以蕴含句作为锚文本、前提作为正样本的情况）。 * 已去重：是 ### `triplet` 子集 * 字段："anchor"、"positive"、"negative" * 字段类型：`str`（字符串）、`str`（字符串）、`str`（字符串） * 示例： python { 'anchor': 'A person on a horse jumps over a broken down airplane.', 'positive': 'A person is outdoors, on a horse.', 'negative': 'A person is at a diner, ordering an omelette.', } * 采集策略：读取 SNLI 与 MultiNLI 数据集，针对每个前提，利用数据集标签生成蕴含句与矛盾句列表，再从这些列表中生成所有可能的三元组；同样不包含反向的锚文本与正样本组合。 * 已去重：是

提供机构：

maas

创建时间：

2025-01-06

搜集汇总

数据集介绍