plaguss/snli-small

Name: plaguss/snli-small
Creator: plaguss
Published: 2023-09-10 14:53:06
License: 暂无描述

Hugging Face2023-09-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/plaguss/snli-small

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是通过Argilla创建的，包含一个符合Argilla数据集格式的配置文件`argilla.yaml`，以及兼容HuggingFace `datasets`库的记录。数据集的结构包括字段、问题、建议和指南。字段是数据集记录本身，目前仅支持文本字段。问题是向注释者提出的问题，可以是评分、文本、单选或多选类型。建议是链接到现有问题的辅助信息，用于帮助注释过程。数据集包含一个名为`train`的单一分割。

This dataset was created using Argilla, and includes a configuration file `argilla.yaml` that adheres to the Argilla dataset format, as well as records compatible with the HuggingFace `datasets` library. The dataset structure consists of four core components: fields, questions, suggestions, and guidelines. Fields represent the dataset records themselves, and currently only text fields are supported. Questions are queries raised for annotators, which can be of scoring, text, single-choice, or multiple-choice types. Suggestions are auxiliary information linked to existing questions, intended to support the annotation process. The dataset contains a single split named `train`.

提供机构：

plaguss

原始信息汇总

数据集卡片 snli-small

数据集描述

数据集概述

该数据集包含：

符合 Argilla 数据集格式的配置文件 argilla.yaml，用于在使用 FeedbackDataset.from_huggingface 方法时配置数据集。
与 HuggingFace datasets 兼容的数据集记录，这些记录在使用 FeedbackDataset.from_huggingface 时会自动加载，也可以通过 datasets 库独立加载。
用于构建和整理数据集的标注指南（如果已在 Argilla 中定义）。

加载方式

使用 Argilla 加载

安装 Argilla： bash pip install argilla --upgrade

加载数据集： python import argilla as rg

ds = rg.FeedbackDataset.from_huggingface("plaguss/snli-small")

使用 `datasets` 库加载

安装 datasets： bash pip install datasets --upgrade

加载数据集： python from datasets import load_dataset

ds = load_dataset("plaguss/snli-small")

支持的任务和排行榜

该数据集可以包含多个字段、问题和响应，因此可以用于不同的 NLP 任务，具体取决于配置。数据集结构在数据集结构部分中描述。

该数据集没有关联的排行榜。

语言

[更多信息需要]

数据集结构

数据在 Argilla 中

数据集在 Argilla 中包含以下内容：

字段 (Fields)：数据集记录本身，目前仅支持文本字段。

字段名称标题类型必填 Markdown

premise 前提 TextField True False

hypothesis 假设 TextField True False
问题 (Questions)：向标注者提出的问题，可以是不同类型，如评分、文本、单选或多选。

问题名称标题类型必填描述值/标签

label 假设是否蕴含前提，既不蕴含也不矛盾，或假设与前提矛盾？ LabelQuestion True N/A [0, 1, 2]
建议 (Suggestions)：与现有问题关联的建议，可选，包含建议的值及其元数据。

建议名称类型允许值

label-suggestion label_selection [0, 1, 2]
指南 (Guidelines)：用于向标注者提供指令的纯文本字符串。

数据实例

在 Argilla 中的数据集实例示例如下： json { "fields": { "hypothesis": "A person is training his horse for a competition.", "premise": "A person on a horse jumps over a broken down airplane." }, "metadata": {}, "responses": [ { "status": "submitted", "values": { "label": { "value": "1" } } } ], "suggestions": [] }

在 HuggingFace datasets 中的相同记录示例如下： json { "external_id": null, "hypothesis": "A person is training his horse for a competition.", "label": [ { "status": "submitted", "user_id": null, "value": "1" } ], "label-suggestion": null, "label-suggestion-metadata": { "agent": null, "score": null, "type": null }, "metadata": "{}", "premise": "A person on a horse jumps over a broken down airplane." }

数据字段

数据集字段包括：

字段 (Fields)：数据集记录本身，目前仅支持文本字段。
- premise：类型为 TextField。
- hypothesis：类型为 TextField。
问题 (Questions)：向标注者提出的问题，可以是不同类型。
- label：类型为 LabelQuestion，允许值为 [0, 1, 2]。
建议 (Suggestions)：自 Argilla 1.13.0 起，包含向标注者提供的建议，可选，包含建议及其元数据。
- label-suggestion：类型为 label_selection，允许值为 [0, 1, 2]。
external_id：可选字段，用于提供数据集记录的外部 ID。

数据分割

数据集包含一个分割，即 train。

5,000+

优质数据集

54 个

任务类型

进入经典数据集

字段名称	标题	类型	必填	Markdown
premise	前提	TextField	True	False
hypothesis	假设	TextField	True	False