piqa

Name: piqa
Creator: maas
Published: 2025-12-05 23:47:56
License: 暂无描述

魔搭社区2025-12-05 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/opencompass/piqa

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "Physical Interaction: Question Answering" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [PIQA homepage](https://yonatanbisk.com/piqa/) - **Paper:** [PIQA: Reasoning about Physical Commonsense in Natural Language](https://arxiv.org/abs/1911.11641) - **Leaderboard:** [Official leaderboard](https://yonatanbisk.com/piqa/) *Note that there is a [2nd leaderboard](https://leaderboard.allenai.org/physicaliqa) featuring a different (blind) test set with 3,446 examples as part of the Machine Commonsense DARPA project.* - **Point of Contact:** [Yonatan Bisk](https://yonatanbisk.com/piqa/) ### Dataset Summary *To apply eyeshadow without a brush, should I use a cotton swab or a toothpick?* Questions requiring this kind of physical commonsense pose a challenge to state-of-the-art natural language understanding systems. The PIQA dataset introduces the task of physical commonsense reasoning and a corresponding benchmark dataset Physical Interaction: Question Answering or PIQA. Physical commonsense knowledge is a major challenge on the road to true AI-completeness, including robots that interact with the world and understand natural language. PIQA focuses on everyday situations with a preference for atypical solutions. The dataset is inspired by instructables.com, which provides users with instructions on how to build, craft, bake, or manipulate objects using everyday materials. ### Supported Tasks and Leaderboards The underlying task is formualted as multiple choice question answering: given a question `q` and two possible solutions `s1`, `s2`, a model or a human must choose the most appropriate solution, of which exactly one is correct. ### Languages The text in the dataset is in English. The associated BCP-47 code is `en`. ## Dataset Structure ### Data Instances An example looks like this: ``` { "goal": "How do I ready a guinea pig cage for it's new occupants?", "sol1": "Provide the guinea pig with a cage full of a few inches of bedding made of ripped paper strips, you will also need to supply it with a water bottle and a food dish.", "sol2": "Provide the guinea pig with a cage full of a few inches of bedding made of ripped jeans material, you will also need to supply it with a water bottle and a food dish.", "label": 0, } ``` Note that the test set contains no labels. Predictions need to be submitted to the leaderboard. ### Data Fields List and describe the fields present in the dataset. Mention their data type, and whether they are used as input or output in any of the tasks the dataset currently supports. If the data has span indices, describe their attributes, such as whether they are at the character level or word level, whether they are contiguous or not, etc. If the datasets contains example IDs, state whether they have an inherent meaning, such as a mapping to other datasets or pointing to relationships between data points. - `goal`: the question which requires physical commonsense to be answered correctly - `sol1`: the first solution - `sol2`: the second solution - `label`: the correct solution. `0` refers to `sol1` and `1` refers to `sol2` ### Data Splits The dataset contains 16,000 examples for training, 2,000 for development and 3,000 for testing. ## Dataset Creation ### Curation Rationale The goal of the dataset is to construct a resource that requires concrete physical reasoning. ### Source Data The authors provide a prompt to the annotators derived from instructables.com. The instructables website is a crowdsourced collection of instruc- tions for doing everything from cooking to car repair. In most cases, users provide images or videos detailing each step and a list of tools that will be required. Most goals are simultaneously rare and unsurprising. While an annotator is unlikely to have built a UV-Flourescent steampunk lamp or made a backpack out of duct tape, it is not surprising that someone interested in home crafting would create these, nor will the tools and materials be unfamiliar to the average person. Using these examples as the seed for their annotation, helps remind annotators about the less prototypical uses of everyday objects. Second, and equally important, is that instructions build on one another. This means that any QA pair inspired by an instructable is more likely to explicitly state assumptions about what preconditions need to be met to start the task and what postconditions define success. Annotators were asked to glance at the instructions of an instructable and pull out or have it inspire them to construct two component tasks. They would then articulate the goal (often centered on atypical materials) and how to achieve it. In addition, annotaters were asked to provide a permutation to their own solution which makes it invalid (the negative solution), often subtly. #### Initial Data Collection and Normalization During validation, examples with low agreement were removed from the data. The dataset is further cleaned to remove stylistic artifacts and trivial examples from the data, which have been shown to artificially inflate model performance on previous NLI benchmarks.using the AFLite algorithm introduced in ([Sakaguchi et al. 2020](https://arxiv.org/abs/1907.10641); [Sap et al. 2019](https://arxiv.org/abs/1904.09728)) which is an improvement on adversarial filtering ([Zellers et al, 2018](https://arxiv.org/abs/1808.05326)). #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process Annotations are by construction obtained when crowdsourcers complete the prompt. #### Who are the annotators? Paid crowdsourcers ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information Unknown ### Citation Information ``` @inproceedings{Bisk2020, author = {Yonatan Bisk and Rowan Zellers and Ronan Le Bras and Jianfeng Gao and Yejin Choi}, title = {PIQA: Reasoning about Physical Commonsense in Natural Language}, booktitle = {Thirty-Fourth AAAI Conference on Artificial Intelligence}, year = {2020}, } ``` ### Contributions Thanks to [@VictorSanh](https://github.com/VictorSanh) for adding this dataset.

# 「物理交互：问答」数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与评测榜单](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据示例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集整理者](#dataset-curators) - [授权信息](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **官网：** [PIQA官网](https://yonatanbisk.com/piqa/) - **相关论文：** [PIQA：自然语言中的物理常识推理](https://arxiv.org/abs/1911.11641) - **评测榜单：** [官方评测榜单](https://yonatanbisk.com/piqa/) *注：另有[第二评测榜单](https://leaderboard.allenai.org/physicaliqa)，其包含3446条示例的盲测集，隶属于DARPA机器常识项目* - **联系人：** [Yonatan Bisk](https://yonatanbisk.com/piqa/) ### 数据集概述 *「无需化妆刷涂抹眼影时，应使用棉签还是牙签？」* 此类需要物理常识的问答任务对当前顶尖的自然语言理解系统仍是挑战。PIQA数据集提出了物理常识推理任务，以及对应的基准数据集——物理交互：问答（Physical Interaction: Question Answering，简称PIQA）。物理常识知识是实现真正通用人工智能（AGI）道路上的重大挑战之一，其涵盖可与现实世界交互并理解自然语言的机器人等研究方向。 PIQA聚焦于日常场景且偏好非典型解决方案，该数据集的灵感来源于instructables.com，该网站为用户提供使用日常材料制作、手工、烘焙或操作物品的指南。 ### 支持任务与评测榜单本任务的形式为多项选择问答：给定问题`q`与两个可选解决方案`s1`、`s2`，模型或人类需选出最恰当的解决方案，且仅有一个选项为正确答案。 ### 语言本数据集文本语言为英语，对应的BCP-47代码为`en`。 ## 数据集结构 ### 数据示例数据示例格式如下： { "goal": "如何为新来的豚鼠准备豚鼠笼？", "sol1": "为豚鼠准备一个铺有几英寸厚碎纸条垫料的笼子，同时还需配备饮水瓶与食盆。", "sol2": "为豚鼠准备一个铺有几英寸厚旧牛仔裤布料垫料的笼子，同时还需配备饮水瓶与食盆。", "label": 0, } *请注意，测试集不包含标签，预测结果需提交至评测榜单。* ### 数据字段以下列出并说明数据集中的各个字段，提及对应的数据类型，以及该字段在数据集当前支持的任务中作为输入还是输出。若数据包含跨度索引，请描述其属性，例如是否基于字符级或词级、是否连续等。若数据集包含示例ID，请说明其是否具备固有含义，例如是否映射至其他数据集或指向数据点间的关联。 - `goal`：需要借助物理常识作答的问题 - `sol1`：第一个解决方案 - `sol2`：第二个解决方案 - `label`：正确解决方案的索引，`0`对应`s1`，`1`对应`s2` ### 数据划分数据集包含16000条训练样本、2000条开发样本与3000条测试样本。 ## 数据集构建 ### 构建初衷本数据集的构建目标是打造一项需要具体物理推理能力的资源。 ### 源数据研究人员为标注人员提供源自instructables.com的提示词。instructables是一个众包平台，汇集了从烹饪到汽车维修等各类操作指南，多数情况下用户会为每个步骤提供图片或视频，并列出所需工具清单。多数任务目标既罕见又合乎情理：标注人员或许从未制作过UV荧光蒸汽朋克台灯或用布基胶带（duct tape）制作背包，但对于关注家庭手工的爱好者而言，这类创作并不罕见，且所用工具与材料对普通人而言也并不陌生。以这类示例作为标注的灵感来源，有助于提醒标注人员留意日常物品的非典型用途。其次，同样重要的是，操作指南往往环环相扣，这意味着源自instructables的问答样本更有可能明确阐述任务启动所需的前置条件，以及定义任务成功的后置条件。标注人员被要求浏览instructables的操作指南，从中提取或启发生成两个子任务，随后阐述任务目标（通常围绕非典型材料展开）与实现路径。此外，标注人员还需为自己提出的解决方案生成一个无效变体（即负样本），通常需设置细微的误导性。 #### 初始数据收集与标准化在验证阶段，标注一致性较低的样本会被移除。数据集还会进一步清洗，去除文体瑕疵与琐碎样本——此前的自然语言推理（NLI）基准测试表明，这类样本会人为抬高模型性能。清洗过程采用了[Sakaguchi等人2020](https://arxiv.org/abs/1907.10641)与[Sap等人2019](https://arxiv.org/abs/1904.09728)提出的AFLite算法，该算法是对对抗性过滤算法[Zellers等人, 2018](https://arxiv.org/abs/1808.05326)的改进。 #### 源语言生成者是谁？ [更多信息待补充] ### 标注 #### 标注流程标注由众包人员完成提示任务时自然生成。 #### 标注人员是谁？付费众包工作者。 ### 个人与敏感信息 [更多信息待补充] ## 数据集使用注意事项 ### 数据集的社会影响 [更多信息待补充] ### 偏差讨论 [更多信息待补充] ### 其他已知局限性 [更多信息待补充] ## 附加信息 ### 数据集整理者 [更多信息待补充] ### 授权信息未知 ### 引用信息 @inproceedings{Bisk2020, author = {Yonatan Bisk and Rowan Zellers and Ronan Le Bras and Jianfeng Gao and Yejin Choi}, title = {PIQA: Reasoning about Physical Commonsense in Natural Language}, booktitle = {Thirty-Fourth AAAI Conference on Artificial Intelligence}, year = {2020}, } ### 贡献者感谢[@VictorSanh](https://github.com/VictorSanh)添加本数据集。

提供机构：

maas

创建时间：

2024-07-02

搜集汇总

数据集介绍