StonyBrookNLP/tellmewhy

Name: StonyBrookNLP/tellmewhy
Creator: StonyBrookNLP
Published: 2024-01-24 21:12:22
License: 暂无描述

Hugging Face2024-01-24 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/StonyBrookNLP/tellmewhy

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language_creators: - found language: - en license: - unknown multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text2text-generation task_ids: [] paperswithcode_id: null pretty_name: TellMeWhy --- # Dataset Card for TellMeWhy ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://stonybrooknlp.github.io/tellmewhy/ - **Repository:** https://github.com/StonyBrookNLP/tellmewhy - **Paper:** https://aclanthology.org/2021.findings-acl.53/ - **Leaderboard:** None - **Point of Contact:** [Yash Kumar Lal](mailto:ylal@cs.stonybrook.edu) ### Dataset Summary TellMeWhy is a large-scale crowdsourced dataset made up of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described. ### Supported Tasks and Leaderboards The dataset is designed to test why-question answering abilities of models when bound by local context. ### Languages English ## Dataset Structure ### Data Instances A typical data point consists of a story, a question and a crowdsourced answer to that question. Additionally, the instance also indicates whether the question's answer would be implicit or if it is explicitly stated in text. If applicable, it also contains Likert scores (-2 to 2) about the answer's grammaticality and validity in the given context. ``` { "narrative":"Cam ordered a pizza and took it home. He opened the box to take out a slice. Cam discovered that the store did not cut the pizza for him. He looked for his pizza cutter but did not find it. He had to use his chef knife to cut a slice.", "question":"Why did Cam order a pizza?", "original_sentence_for_question":"Cam ordered a pizza and took it home.", "narrative_lexical_overlap":0.3333333333, "is_ques_answerable":"Not Answerable", "answer":"Cam was hungry.", "is_ques_answerable_annotator":"Not Answerable", "original_narrative_form":[ "Cam ordered a pizza and took it home.", "He opened the box to take out a slice.", "Cam discovered that the store did not cut the pizza for him.", "He looked for his pizza cutter but did not find it.", "He had to use his chef knife to cut a slice." ], "question_meta":"rocstories_narrative_41270_sentence_0_question_0", "helpful_sentences":[ ], "human_eval":false, "val_ann":[ ], "gram_ann":[ ] } ``` ### Data Fields - `question_meta` - Unique meta for each question in the corpus - `narrative` - Full narrative from ROCStories. Used as the context with which the question and answer are associated - `question` - Why question about an action or event in the narrative - `answer` - Crowdsourced answer to the question - `original_sentence_for_question` - Sentence in narrative from which question was generated - `narrative_lexical_overlap` - Unigram overlap of answer with the narrative - `is_ques_answerable` - Majority judgment by annotators on whether an answer to this question is explicitly stated in the narrative. If "Not Answerable", it is part of the Implicit-Answer questions subset, which is harder for models. - `is_ques_answerable_annotator` - Individual annotator judgment on whether an answer to this question is explicitly stated in the narrative. - `original_narrative_form` - ROCStories narrative as an array of its sentences - `human_eval` - Indicates whether a question is a specific part of the test set. Models should be evaluated for their answers on these questions using the human evaluation suite released by the authors. They advocate for this human evaluation to be the correct way to track progress on this dataset. - `val_ann` - Array of Likert scores (possible sizes are 0 and 3) about whether an answer is valid given the question and context. Empty arrays exist for cases where the human_eval flag is False. - `gram_ann` - Array of Likert scores (possible sizes are 0 and 3) about whether an answer is grammatical. Empty arrays exist for cases where the human_eval flag is False. ### Data Splits The data is split into training, valiudation, and test sets. | Train | Valid | Test | | ------ | ----- | ----- | | 23964 | 2992 | 3563 | ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data ROCStories corpus (Mostafazadeh et al, 2016) #### Initial Data Collection and Normalization ROCStories was used to create why-questions related to actions and events in the stories. #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process Amazon Mechanical Turk workers were provided a story and an associated why-question, and asked to answer. Three answers were collected for each question. For a small subset of questions, the quality of answers was also validated in a second round of annotation. This smaller subset should be used to perform human evaluation of any new models built for this dataset. #### Who are the annotators? Amazon Mechanical Turk workers ### Personal and Sensitive Information None ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Evaluation To evaluate progress on this dataset, the authors advocate for human evaluation and release a suite with the required settings [here](https://github.com/StonyBrookNLP/tellmewhy). Once inference on the test set has been completed, please filter out the answers on which human evaluation needs to be performed by selecting the questions (one answer per question, deduplication might be needed) in the test set where the `human_eval` flag is set to `True`. This subset can then be used to complete the requisite evaluation on TellMeWhy. ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ``` @inproceedings{lal-etal-2021-tellmewhy, title = "{T}ell{M}e{W}hy: A Dataset for Answering Why-Questions in Narratives", author = "Lal, Yash Kumar and Chambers, Nathanael and Mooney, Raymond and Balasubramanian, Niranjan", booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.findings-acl.53", doi = "10.18653/v1/2021.findings-acl.53", pages = "596--610", } ``` ### Contributions Thanks to [@yklal95](https://github.com/ykl7) for adding this dataset.

提供机构：

StonyBrookNLP

原始信息汇总

数据集概述

数据集名称

TellMeWhy

数据集摘要

TellMeWhy是一个大规模的众包数据集，包含超过30,000个关于短篇叙事中角色为何执行所描述动作的问题及其自由形式的答案。

支持的任务

该数据集旨在测试模型在局部上下文约束下回答为什么问题的能力。

语言

英语

数据集结构

数据实例

每个数据点包含一个故事、一个问题以及众包的答案。此外，还指示了问题的答案是否在文本中明确提及，以及答案的语法性和有效性的Likert评分（-2至2）。

数据字段

question_meta: 每个问题在语料库中的唯一元数据。
narrative: 来自ROCStories的完整叙事，作为问题和答案的上下文。
question: 关于叙事中动作或事件的为什么问题。
answer: 众包的答案。
original_sentence_for_question: 问题生成来源的叙事句子。
narrative_lexical_overlap: 答案与叙事的词重叠率。
is_ques_answerable: 注释者多数判断问题答案是否在叙事中明确提及。
is_ques_answerable_annotator: 单个注释者判断问题答案是否在叙事中明确提及。
original_narrative_form: ROCStories叙事作为句子数组。
human_eval: 指示问题是否为测试集的特定部分。
val_ann: 关于答案有效性的Likert评分数组。
gram_ann: 关于答案语法性的Likert评分数组。

数据分割

数据集分为训练集、验证集和测试集，具体分配如下：

训练集	验证集	测试集
23964	2992	3563

数据集创建

源数据

数据集基于ROCStories corpus创建，用于生成与故事中动作和事件相关的为什么问题。

注释过程

通过Amazon Mechanical Turk众包工人进行注释，每个问题收集三个答案，并对一小部分问题进行了二次注释以验证答案质量。

使用数据集的考虑

评估

为了评估数据集上的进展，作者推荐使用人类评估，并提供了一个评估套件。在测试集上完成推理后，应筛选出需要进行人类评估的答案，即那些human_eval标志设置为True的问题。

5,000+

优质数据集

54 个

任务类型

进入经典数据集