基于视频内容与因果关系的问答数据集

Name: 基于视频内容与因果关系的问答数据集
Creator: 上海交通大学
Published: 2024-03-05T17:40:54+08:00

国家基础学科公共科学数据中心2024-03-05 收录

影视内容分析

因果推理

数据链接：

https://www.nbsdc.cn/general/dataDetail?id=64edfca7bb16e0300cd4df8c&type=1 数据链接链接失效反馈

官方服务：

资源简介：

本数据集用于因果视频问答（Causal-VidQA）任务。对于一个视频片段，因果视频问答任务要求模型同时回答四种类型的问题，包括场景描述（描述类问题）、基于事实证据的推理（解释类问题）和基于常识知识的推理（预测类与反事实推理类问题）。数据集的标注一共分为三个部分，首先是针对视频的物体定位于追踪，其次是邀请标注员进行问题与答案的标注，最后是多轮问答的生成。我们选择了最大的视频动作数据集，Kinetics-700，保留了546,882个长度大于9秒的视频； 1.在视频的物体定位于追踪中，利用图像物体识别与视频物体识别模型相结合的方式，给视频中的关键物体进行了标注和追踪； 2.在问答标注时，我们将任意两个标注员分为一组，其中一个标注员（提问者）负责提问，另一个标注员（回答者）负责回答和判断，我们筛选出来26900个视频进行提问，一共得到了107600组问答； 3.本数据集的评价方式为多项问答，按照以下步骤生成有干扰性的多项问答候选集，a) 按问题类型对问题进行分组； b) 根据BERT的整句表征的余弦相似度，检索与同一问题组中每个问题最相似的前50个问题，并将相应的答案视为有干扰性候选集；c) 通过 1) 词根还原；2）特征向量的余弦相似度大于0.9，过滤出候选集中与正确答案过于相似的答案；d)从余下的候选集中抽取四个候选答案作为最后的多项选择的候选答案；e) 在候选答案中随机均匀插入正确答案，形成5个选项；f) 手动检查所有问答候选集，以确保每个问题对应一个正确答案。本数据集文件共61.5GB。

This dataset is designed for the Causal Video Question Answering (Causal-VidQA) task. For a given video clip, the Causal-VidQA task requires the model to answer four types of questions simultaneously, including scene description (descriptive questions), fact-evidence-based reasoning (explanatory questions), and common-sense knowledge-based reasoning (predictive and counterfactual reasoning questions). The annotation of this dataset is divided into three stages: first, object localization and tracking for the video clips; second, inviting annotators to annotate questions and answers; third, generating multi-turn question-answer pairs. We selected the largest video action dataset, Kinetics-700, and retained 546,882 video clips with a duration longer than 9 seconds. 1. For object localization and tracking: we combined image object recognition and video object recognition models to annotate and track key objects in the video clips. 2. For question-answer annotation: we paired any two annotators into a team, where one annotator (questioner) was responsible for asking questions, and the other (answerer) was responsible for answering and verifying. We screened 26,900 video clips for question generation, and ultimately obtained 107,600 question-answer pairs. 3. The evaluation of this dataset adopts multiple-choice question answering. We generate distractor multiple-choice candidate sets through the following steps: a) Group questions by their types; b) Retrieve the top 50 most similar questions for each question in the same group based on the cosine similarity of BERT's sentence-level embeddings, and treat their corresponding answers as distractor candidate sets; c) Filter out answers in the candidate set that are overly similar to the correct answer via two steps: 1) stemming; 2) cosine similarity of feature vectors greater than 0.9; d) Select four candidate answers from the remaining set as the final multiple-choice options; e) Randomly and uniformly insert the correct answer into the candidate set to form 5 options; f) Manually check all question-answer candidate sets to ensure that each question corresponds to exactly one correct answer. The total size of this dataset files is 61.5 GB.

提供机构：

上海交通大学

搜集汇总

数据集介绍