P3

Name: P3
Creator: maas
Published: 2025-12-05 16:56:48
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/bigscience/P3

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for P3 ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://bigscience.huggingface.co/promptsource - **Repository:** https://github.com/bigscience-workshop/promptsource/ - **Paper:** [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) - **Point of Contact:** [Victor Sanh](mailto:victor@huggingface.co) ### Dataset Summary P3 (Public Pool of Prompts) is a collection of prompted English datasets covering a diverse set of NLP tasks. A prompt is the combination of an input template and a target template. The templates are functions mapping a data example into natural language for the input and target sequences. For example, in the case of an NLI dataset, the data example would include fields for *Premise, Hypothesis, Label*. An input template would be *If {Premise} is true, is it also true that {Hypothesis}?*, whereas a target template can be defined with the label choices *Choices[label]*. Here *Choices* is prompt-specific metadata that consists of the options *yes, maybe, no* corresponding to *label* being entailment (0), neutral (1) or contradiction (2). Prompts are collected using [Promptsource](https://github.com/bigscience-workshop/promptsource), an interface to interactively write prompts on datasets, and collect prompt-specific metadata such as evaluation metrics. As of October 13th, there are 2'000 prompts collected for 270+ data(sub)sets. The collection of prompts of P3 is publicly available on [Promptsource](https://github.com/bigscience-workshop/promptsource). To train [T0*](https://huggingface.co/bigscience/T0pp), we used a subset of the prompts available in Promptsource (see details [here](https://huggingface.co/bigscience/T0pp#training-data)). However, some of the prompts use `random.choice`, a method that selects uniformly at random an option in a list of valid possibilities. For reproducibility purposes, we release the collection of prompted examples used to train T0*. **The data available here are the materialized version of the prompted datasets used in [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) which represent only a subset of the datasets for which there is at least one prompt in Promptsource.** ### Supported Tasks and Leaderboards The tasks represented in P3 cover a diverse set of NLP tasks including multiple-choice QA, sentiment analysis or natural language inference. We detail the full list of datasets in [Source Data](#source-data). ### Languages The data in P3 are in English (BCP-47 `en`). ## Dataset Structure ### Data Instances An example of "train" looks as follows: ```bash { 'answer_choices': ['safe', 'trolley'], 'inputs': [86, 8, 7142, 666, 6, 405, 8, 3, 834, 1518, 21, 1346, 42, 31682, 58, 37, 3, 929, 9, 3042, 63, 2765, 808, 8, 2045, 6448, 326, 13, 8, 31682, 11, 3, 24052, 135, 16, 8, 1346, 552, 8, 3, 834, 47, 6364, 5], 'inputs_pretokenized': 'In the sentence below, does the _ stand for safe or trolley?\nThe treasury workers took the gold bars off of the trolley and stacked them in the safe until the _ was empty.', 'targets': [31682, 1], 'targets_pretokenized': '\ntrolley' } ``` In the case of rank classification (letting the model select its the prediction the option with the highest log-likelihood), an example looks as follows: ```bash { 'idx': [5, 0], 'inputs': [86, 8, 7142, 666, 6, 405, 8, 3, 834, 1518, 21, 19454, 42, 22227, 58, 19454, 744, 31, 17, 2112, 4553, 17742, 7, 12, 1953, 6, 298, 22227, 966, 373, 405, 5, 3, 834, 19, 72, 952, 12, 619, 16, 3, 9, 17742, 3298, 5], 'inputs_pretokenized': "In the sentence below, does the _ stand for Kyle or Logan?\nKyle doesn't wear leg warmers to bed, while Logan almost always does. _ is more likely to live in a warmer climate.", 'is_correct': True, 'targets': [19454, 1], 'targets_pretokenized': 'Kyle', 'weight': 1.0 } ``` To check all the prompted examples, you can use the [Promptsource hosted tool](http://bigscience.huggingface.co/promptsource) and choose the `Prompted dataset viewer` mode in the left panel. ### Data Fields The data fields are the same among all splits: - `answer_choices`: the choices (in natural language) available to the model - `inputs_pretokenized`: the natural language input fed to the model - `targets_pretokenized`: the natural language target that the model has to generate - `inputs`: the tokenized input with [T5](https://huggingface.co/google/t5-v1_1-base)'s tokenizer - `targets`: the tokenized target with [T5](https://huggingface.co/google/t5-v1_1-base)'s tokenizer - `idx`: identifier of the (example, answer_option_id) in the case of rank classification - `weight`: a weight for the example produced by seqio (always set to 1.0 in practise) - `is_correct`: whether the (example, answer_option_id) is the correct one ### Data Splits The list of data splits and their respective sizes is very long. You'll find the whole list in this [file](https://huggingface.co/datasets/bigscience/P3/blob/main/tasks_splits_and_features.py). ## Dataset Creation ### Curation Rationale The Public Pool of Prompts relies on the Hugging Face Dataset library. Any public dataset in the Datasets library can be prompted. We select the datasets that have at least one subset in English and excluded datasets containing (predominantly) non-natural language examples. We conservatively decided not to prompt datasets that contain potentially harmful content (for instance, datasets built on social media content). However, we sometimes prompt datasets that are purposefully built to measure bias and fairness of trained models, and reserve these prompted datasets (the validation or test sets) for evaluation purposes. ### Source Data Here's the full list of the datasets present in the materialized version of P3: - Multiple-Choice QA - CommonsenseQA - DREAM - QUAIL - QuaRTz - Social IQA - WiQA - Cosmos - QASC - Quarel - SciQ - Wiki Hop - ARC - OpenBookQA - MultiRC - PIQA - RACE - HellaSwag - BoolQ - Extractive QA - Adversarial QA - Quoref - DuoRC - ROPES - SQuAD v2 - ReCoRD - Close-book QA - Hotpot QA - Wiki QA - Trivia QA - Web Questions - Structure-to-text - Common Gen - Wiki Bio - Sentiment - Amazon - App Reviews - IMDB - Rotten Tomatoes - Yelp - Summarization - CNN Daily Mail - Gigaword - MultiNews - SamSum - XSum - Topic Classification - AG News - DBPedia - TREC - Paraphrase Identification - MRPC - PAWS - QQP - Natural Language Inference - ANLI - CB - RTE - Coreference Resolution - WSC - Winogrande - Word Sense disambiguation - WiC - Sentence Completion - COPA - HellaSwag - Story Cloze ### Annotations The prompts available in Promptsource are collected as part of BigScience, one-year long research workshop on large multilingual models and datasets. 36 contributors affiliated with 24 institutions in 8 countries participated to the prompt collection. Contributors are in majority machine learning researchers or machine learning engineers. The main annotation guideline was that prompts needed to be grammatical and understandable by a native English speaker with no prior experience of the tasks. Additionally, prompts that required explicit counting or numerical indexing were removed in favor of natural language variants, e.g., instead of predicting indices of a span to extract (e.g. in extractive question answering), the model was expected to copy the span's text instead. With these minimal constraints, prompt writers were encouraged to use both formal and creative prompts and various orderings of the data. Most of the prompts correspond directly to a version of the original proposed task, although we also allowed prompts that permuted the original task (for instance, generating a document from its summary) or allowed for ambiguous output (for instance, not indicating a list of available choices). The full annotation given to the contributors can be found [here](https://github.com/bigscience-workshop/promptsource/blob/main/CONTRIBUTING.md). *Note to self: the link is currently being updated with the) ## Additional Information ### Licensing Information The dataset is released under Apache 2.0. ### Citation Information ```bibtex @misc{sanh2021multitask, title={Multitask Prompted Training Enables Zero-Shot Task Generalization}, author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush}, year={2021}, eprint={2110.08207}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` ### Contributions Thanks to the contributors of [promptsource](https://github.com/bigscience-workshop/promptsource/graphs/contributors) for adding this dataset.

# P3 数据集卡片 ## 目录 - [目录](#table-of-contents) - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据样本](#data-instances) - [数据字段](#data-fields) - [数据拆分](#data-splits) - [数据集构建](#dataset-creation) - [构建原则](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [附加信息](#additional-information) - [授权信息](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **主页:** https://bigscience.huggingface.co/promptsource - **代码仓库:** https://github.com/bigscience-workshop/promptsource/ - **论文:** [多任务提示训练实现零样本任务泛化](https://arxiv.org/abs/2110.08207) - **联系方式:** [Victor Sanh](mailto:victor@huggingface.co) ### 数据集概述 P3（Public Pool of Prompts，提示公共池）是一批经过提示工程处理的英文数据集集合，涵盖了多样化的自然语言处理（Natural Language Processing，NLP）任务。提示（Prompt）由输入模板与目标模板组合而成，这类模板是将单条数据样本映射为自然语言形式的输入序列与目标序列的函数。例如在自然语言推理（Natural Language Inference，NLI）数据集场景中，单条数据样本包含*Premise*、*Hypothesis*、*Label*三个字段。输入模板可以是*若{Premise}为真，则{Hypothesis}是否同样为真？*，而目标模板可通过标签对应选项集`Choices[label]`定义。此处`Choices`为提示专属元数据，包含与标签（entailment（0）、neutral（1）、contradiction（2））对应的选项*yes、maybe、no*。提示通过[提示源（Promptsource）](https://github.com/bigscience-workshop/promptsource)收集，该工具是一个可交互式为数据集编写提示并收集提示专属元数据（如评估指标）的界面。截至2021年10月13日，该集合已为270余个数据（子）集收集了2000条提示。P3的提示集合可在[提示源（Promptsource）](https://github.com/bigscience-workshop/promptsource)上公开获取。为训练[T0*](https://huggingface.co/bigscience/T0pp)，我们使用了提示源中可用的一部分提示（详细信息见[此处](https://huggingface.co/bigscience/T0pp#training-data)）。不过，部分提示使用了`random.choice`函数——该方法会从有效选项列表中均匀随机选取一个选项。为保证可复现性，我们发布了用于训练T0*的经过提示工程处理的样本集合。**本文发布的数据是[多任务提示训练实现零样本任务泛化](https://arxiv.org/abs/2110.08207)一文中使用的经过提示工程处理的数据集的物化版本，仅包含提示源中至少存在一条提示的数据集的子集。** ### 支持任务与排行榜 P3涵盖的任务类型丰富多样，涵盖多项自然语言处理任务，包括多项选择问答、情感分析以及自然语言推理等。完整的数据集列表详见[源数据](#source-data)章节。 ### 语言 P3中的数据均为英文（BCP-47标签为`en`）。 ## 数据集结构 ### 数据样本「训练集」样本示例如下： bash { 'answer_choices': ['safe', 'trolley'], 'inputs': [86, 8, 7142, 666, 6, 405, 8, 3, 834, 1518, 21, 1346, 42, 31682, 58, 37, 3, 929, 9, 3042, 63, 2765, 808, 8, 2045, 6448, 326, 13, 8, 31682, 11, 3, 24052, 135, 16, 8, 1346, 552, 8, 3, 834, 47, 6364, 5], 'inputs_pretokenized': 'In the sentence below, does the _ stand for safe or trolley? The treasury workers took the gold bars off of the trolley and stacked them in the safe until the _ was empty.', 'targets': [31682, 1], 'targets_pretokenized': ' trolley' } 在排序分类场景下（即让模型选择对数似然最高的预测选项），样本示例如下： bash { 'idx': [5, 0], 'inputs': [86, 8, 7142, 666, 6, 405, 8, 3, 834, 1518, 21, 19454, 42, 22227, 58, 19454, 744, 31, 17, 2112, 4553, 17742, 7, 12, 1953, 6, 298, 22227, 966, 373, 405, 5, 3, 834, 19, 72, 952, 12, 619, 16, 3, 9, 17742, 3298, 5], 'inputs_pretokenized': "In the sentence below, does the _ stand for Kyle or Logan? Kyle doesn't wear leg warmers to bed, while Logan almost always does. _ is more likely to live in a warmer climate.", 'is_correct': True, 'targets': [19454, 1], 'targets_pretokenized': 'Kyle', 'weight': 1.0 } 如需查看所有经过提示工程处理的样本，可使用[提示源托管工具](http://bigscience.huggingface.co/promptsource)，并在左侧面板选择「提示数据集查看器」模式。 ### 数据字段所有拆分下的数据字段均保持一致： - `answer_choices`：模型可选择的自然语言选项集合 - `inputs_pretokenized`：输入至模型的自然语言文本 - `targets_pretokenized`：模型需要生成的自然语言目标文本 - `inputs`：使用[T5](https://huggingface.co/google/t5-v1_1-base)分词器处理后的分词输入 - `targets`：使用[T5](https://huggingface.co/google/t5-v1_1-base)分词器处理后的分词目标文本 - `idx`：排序分类场景下（样本，答案选项ID）的标识符 - `weight`：由seqio生成的样本权重（实际应用中恒设为1.0） - `is_correct`：标记（样本，答案选项ID）是否为正确配对 ### 数据拆分数据拆分及其对应规模的列表篇幅较长，完整列表可参见该[文件](https://huggingface.co/datasets/bigscience/P3/blob/main/tasks_splits_and_features.py)。 ## 数据集构建 ### 构建原则提示公共池依托Hugging Face数据集库构建，该库中的所有公开数据集均可进行提示工程处理。我们筛选出至少包含一个英文子集的数据集，并排除了主要包含非自然语言样本的数据集。我们秉持审慎原则，未对包含潜在有害内容的数据集进行提示工程处理（例如基于社交媒体内容构建的数据集）。不过，我们会对专门用于评估训练模型偏差与公平性的数据集进行提示工程处理，并将这类经过处理的数据集（验证集或测试集）留作评估使用。 ### 源数据 P3物化版本中包含的完整数据集列表如下： - 多项选择问答 - CommonsenseQA - DREAM - QUAIL - QuaRTz - Social IQA - WiQA - Cosmos - QASC - Quarel - SciQ - Wiki Hop - ARC - OpenBookQA - MultiRC - PIQA - RACE - HellaSwag - BoolQ - 抽取式问答 - Adversarial QA - Quoref - DuoRC - ROPES - SQuAD v2 - ReCoRD - 闭卷问答 - Hotpot QA - Wiki QA - Trivia QA - Web Questions - 结构到文本 - Common Gen - Wiki Bio - 情感分析 - Amazon - App Reviews - IMDB - Rotten Tomatoes - Yelp - 摘要生成 - CNN Daily Mail - Gigaword - MultiNews - SamSum - XSum - 主题分类 - AG News - DBPedia - TREC - 释义识别 - MRPC - PAWS - QQP - 自然语言推理 - ANLI - CB - RTE - 共指消解 - WSC - Winogrande - 词义消歧 - WiC - 句子补全 - COPA - HellaSwag - Story Cloze ### 标注信息提示源中的提示是BigScience项目的成果之一，该项目是一项为期一年的大型多语言模型与数据集研究工坊。来自8个国家24家机构的36位贡献者参与了提示收集工作，其中绝大多数为机器学习研究人员与机器学习工程师。核心标注准则为：提示需符合语法规范，且能被无该任务先验经验的母语使用者理解。此外，所有需要显式计数或数值索引的提示均被替换为自然语言形式的变体。例如，在抽取式问答任务中，不再要求模型预测待抽取片段的索引，而是让模型直接复制该片段的文本内容。在上述极简约束下，我们鼓励提示编写者使用正式或创意化的提示形式，并采用多种数据排序方式。大多数提示直接对应原始任务的某一版本，但我们也允许对原始任务进行变形的提示（例如从摘要生成原文），或是允许输出存在歧义的提示（例如不提供可选选项列表）。完整的标注指南可参见[此处](https://github.com/bigscience-workshop/promptsource/blob/main/CONTRIBUTING.md)。* 备注：该链接目前正在更新中 ## 附加信息 ### 授权信息本数据集采用Apache 2.0协议开源。 ### 引用信息 bibtex @misc{sanh2021multitask, title={Multitask Prompted Training Enables Zero-Shot Task Generalization}, author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush}, year={2021}, eprint={2110.08207}, archivePrefix={arXiv}, primaryClass={cs.LG} } ### 贡献者感谢[提示源（Promptsource）](https://github.com/bigscience-workshop/promptsource/graphs/contributors)的各位贡献者为本数据集添加内容。

提供机构：

maas

创建时间：

2025-11-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集