jayelm/natural-instructions

Name: jayelm/natural-instructions
Creator: jayelm
Published: 2023-01-29 23:16:06
License: 暂无描述

Hugging Face2023-01-29 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/jayelm/natural-instructions

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced - expert-generated language: - en multilinguality: - monolingual size_categories: - 100M<n<1B task_categories: - other --- Preprocessed version of Super-Natural-Instructions from https://github.com/allenai/natural-instructions/tree/master/splits. The same inputs may appear with different outputs, thus to avoid duplicate inputs, you can deduplicate by the `id` or the `inputs` field. This is modified from https://huggingface.co/datasets/Muennighoff/natural-instructions with a few improvements: 1. Adds positive/negative examples, outputs, explanations for each task, to support different task definitions. 2. Adds an "eval" field which which is True for the first 100 examples of each test task (119 * 100 = 11900 examples). This field indicates whether an example is part of the abbreviated + balanced test split. See https://github.com/allenai/natural-instructions/blob/master/src/reorder_instances_for_testing.py. 3. Adds an "eval" field to the training dataset, which can be used as an in-domain evaluation set. To do so, we sample a balanced set the first 15 examples of each train split (757 * 15 = 11355 examples) and mark the "eval" field as true.

提供机构：

jayelm

原始信息汇总

数据集概述

数据集基本信息

标注创建者: 众包生成、专家生成
语言: 英语
多语言性: 单语种
数据集大小: 100M<n<1B
任务类别: 其他

数据集修改与改进

来源: 基于Super-Natural-Instructions的预处理版本，修改自Muennighoff/natural-instructions。
改进内容:
1. 增加了每个任务的正负示例、输出和解释，以支持不同的任务定义。
2. 在测试任务的前100个示例中添加了"eval"字段（119 * 100 = 11900个示例），用于指示示例是否属于简短且平衡的测试分割。
3. 在训练数据集中添加了"eval"字段，可作为域内评估集使用。通过平衡采样每个训练分割的前15个示例（757 * 15 = 11355个示例）并标记"eval"字段为真。

数据集处理

去重方法: 可以通过id或inputs字段进行去重，以避免输入重复。

5,000+

优质数据集

54 个

任务类型

进入经典数据集