A Corpus for Commonsense Inference in the Story Cloze Test
收藏数据集概述
数据集名称
A Corpus for Commonsense Inference in the Story Cloze Test
数据集格式
- Arrow格式: 位于
baseline_data/目录 - 原始文本格式: 位于
baseline_texts/目录
数据集内容
该数据集基于Story Cloze Test,旨在训练和评估机器学习算法在叙事理解和推理方面的能力。数据集包含1871个故事,每个故事由三名人工标注者进行标注,用于决定故事结尾句和哪句话对推理贡献最大。
相关任务
- 叙事推理类别预测
- 贡献句子多标签分类
数据集用途
用于预测叙事推理的类别和贡献句子,以及评估模型在原始Story Cloze Test任务上的表现。
引用信息
@inproceedings{yao-etal-2022-corpus, title = "A Corpus for Commonsense Inference in Story Cloze Test", author = "Yao, Bingsheng and Joseph, Ethan and Lioanag, Julian and Si, Mei", booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2022.lrec-1.375", pages = "3500--3508", abstract = "The Story Cloze Test (SCT) is designed for training and evaluating machine learning algorithms for narrative understanding and inferences. The SOTA models can achieve over 90{%} accuracy on predicting the last sentence. However, it has been shown that high accuracy can be achieved by merely using surface-level features. We suspect these models may not extit{truly} understand the story. Based on the SCT dataset, we constructed a human-labeled and human-verified commonsense knowledge inference dataset. Given the first four sentences of a story, we asked crowd-source workers to choose from four types of narrative inference for deciding the ending sentence and which sentence contributes most to the inference. We accumulated data on 1871 stories, and three human workers labeled each story. Analysis of the intra-category and inter-category agreements show a high level of consensus. We present two new tasks for predicting the narrative inference categories and contributing sentences. Our results show that transformer-based models can reach SOTA performance on the original SCT task using transfer learning but don{}t perform well on these new and more challenging tasks.", }




