jayelm/natural-instructions
收藏Hugging Face2023-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jayelm/natural-instructions
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
- expert-generated
language:
- en
multilinguality:
- monolingual
size_categories:
- 100M<n<1B
task_categories:
- other
---
Preprocessed version of Super-Natural-Instructions from https://github.com/allenai/natural-instructions/tree/master/splits. The same inputs may appear with different outputs, thus to avoid duplicate inputs, you can deduplicate by the `id` or the `inputs` field.
This is modified from https://huggingface.co/datasets/Muennighoff/natural-instructions
with a few improvements:
1. Adds positive/negative examples, outputs, explanations for each task, to
support different task definitions.
2. Adds an "eval" field which which is True for the first 100 examples of each
test task (119 * 100 = 11900 examples). This field indicates whether an example
is part of the abbreviated + balanced test split. See
https://github.com/allenai/natural-instructions/blob/master/src/reorder_instances_for_testing.py.
3. Adds an "eval" field to the training dataset, which can be used as an
in-domain evaluation set. To do so, we sample a balanced set the first 15
examples of each train split (757 * 15 = 11355 examples) and mark the "eval"
field as true.
提供机构:
jayelm
原始信息汇总
数据集概述
数据集基本信息
- 标注创建者: 众包生成、专家生成
- 语言: 英语
- 多语言性: 单语种
- 数据集大小: 100M<n<1B
- 任务类别: 其他
数据集修改与改进
- 来源: 基于Super-Natural-Instructions的预处理版本,修改自Muennighoff/natural-instructions。
- 改进内容:
- 增加了每个任务的正负示例、输出和解释,以支持不同的任务定义。
- 在测试任务的前100个示例中添加了"eval"字段(119 * 100 = 11900个示例),用于指示示例是否属于简短且平衡的测试分割。
- 在训练数据集中添加了"eval"字段,可作为域内评估集使用。通过平衡采样每个训练分割的前15个示例(757 * 15 = 11355个示例)并标记"eval"字段为真。
数据集处理
- 去重方法: 可以通过
id或inputs字段进行去重,以避免输入重复。



