zest

Name: zest
Creator: maas
Published: 2025-07-03 16:28:59
License: 暂无描述

魔搭社区2025-07-03 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/zest

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "ZEST: ZEroShot learning from Task descriptions" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://allenai.org/data/zest - **Repository:** https://github.com/allenai/zest - **Paper:** https://arxiv.org/abs/2011.08115 - **Leaderboard:** https://leaderboard.allenai.org/zest/submissions/public - **Point of Contact:** ### Dataset Summary ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways. ### Supported Tasks and Leaderboards A [leaderboard](https://leaderboard.allenai.org/zest/submissions/public) is included with accepatbility metrics for each of the four generalization types outlined in the paper. The metrics are novel acceptability metrics also proposed by the authors. ### Languages The dataset is in English. ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale To evaluate the ability of a model to generalize to unseen tasks based only on a task description in a zero-shot manner. ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? Mechanical Turk crowdsource workers. ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? Mechanical Turk crowdsource workers. ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset The dataset emphasizes a model's ability to generalize to unseen tasks with only a natural language description of the task. The long-term vision of this type of evaluation is to facilitate the creation of models which can perform arbitrary tasks with only a prompt from a non-technical user. This could broaden the frontier of what a user can ask something like a chatbot to do for them, but it is unclear how restrictions would be put in place to prevent users from prompting a system to perform unethical tasks. ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information This dataset is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). ### Citation Information ``` @inproceedings{weller-etal-2020-learning, title = "Learning from Task Descriptions", author = "Weller, Orion and Lourie, Nicholas and Gardner, Matt and Peters, Matthew", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.105", pages = "1361--1375", abstract = "Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this frame- work with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks. Formulating task descriptions as questions, we ensure each is general enough to apply to many possible inputs, thus comprehensively evaluating a model{'}s ability to solve each task. Moreover, the dataset{'}s structure tests specific types of systematic generalization. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers.", } ``` ### Contributions Thanks to [@joeddav](https://github.com/joeddav) for adding this dataset.

# 数据集卡片："ZEST：基于任务描述的零样本学习" ## 目录 - [数据集描述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持的任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建动因](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差问题讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集管理者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [致谢](#contributions) ## 数据集描述 - **主页:** https://allenai.org/data/zest - **代码仓库:** https://github.com/allenai/zest - **论文:** https://arxiv.org/abs/2011.08115 - **排行榜:** https://leaderboard.allenai.org/zest/submissions/public - **联系方式:** ### 数据集摘要 ZEST用于测试自然语言处理（Natural Language Processing, NLP）系统在给定任务的自然语言描述时，能否以零样本方式完成未见任务。本数据集是我们提出的"基于任务描述学习"框架的具象化实现。其涵盖分类、类型化实体抽取以及关系抽取三类任务，每个任务均配有20组不同的带标注（输入、输出）示例。ZEST的结构支持我们系统性地测试模型在五种不同维度上的泛化能力。 ### 支持的任务与排行榜本数据集附带了针对论文中所述四类泛化类型的可接受性指标排行榜，链接为<https://leaderboard.allenai.org/zest/submissions/public>。这些指标均为本文作者提出的新型可接受性指标。 ### 语言本数据集采用英语编写。 ## 数据集结构 ### 数据实例 [需补充更多信息] ### 数据字段 [需补充更多信息] ### 数据划分 [需补充更多信息] ## 数据集构建 ### 构建动因旨在评估模型仅通过任务的自然语言描述，以零样本方式泛化至未见任务的能力。 ### 源数据 #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生成者是谁？亚马逊机械 Turk（Amazon Mechanical Turk）众包工人。 ### 标注信息 #### 标注流程 [需补充更多信息] #### 标注人员是谁？亚马逊机械 Turk（Amazon Mechanical Turk）众包工人。 ### 个人与敏感信息 [需补充更多信息] ## 数据集使用注意事项 ### 数据集的社会影响本数据集着重评估模型仅通过任务的自然语言描述即可泛化至未见任务的能力。此类评估的长期愿景是助力开发出仅需非技术用户的提示即可执行任意任务的模型。这或将拓宽用户向聊天机器人等工具提出任务请求的边界，但目前尚不清楚如何设置限制以防止用户提示系统执行不道德任务。 ### 偏差问题讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集管理者 [需补充更多信息] ### 许可信息本数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可协议进行授权。 ### 引用信息 @inproceedings{weller-etal-2020-learning, title = "Learning from Task Descriptions", author = "Weller, Orion and Lourie, Nicholas and Gardner, Matt and Peters, Matthew", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.105", pages = "1361--1375", abstract = "通常而言，机器学习系统通过在数千个示例上训练来解决新任务。与之相对，人类仅需阅读若干说明（或许搭配一两个示例）即可解决新任务。为缩小这一差距，我们提出了一种用于开发自然语言处理系统的框架，该系统可通过阅读任务描述来解决新任务，同时整合了该领域此前的相关研究。我们通过构建新型英语数据集ZEST将该框架具象化，该数据集专为未见任务的面向任务型评估而设计。我们将任务描述表述为问题，确保每个描述都具有足够的普适性以适配多种可能的输入，从而全面评估模型解决各类任务的能力。此外，该数据集的结构可测试特定类型的系统性泛化能力。我们发现，当前最先进的T5模型在ZEST上的得分仅为12%，这为自然语言处理研究者留下了巨大的研究空间。", } ### 致谢感谢 [@joeddav](https://github.com/joeddav) 为本数据集的收录提供支持。

提供机构：

maas

创建时间：

2025-05-27

搜集汇总

数据集介绍