A2D Sentences (Sentences for the Actor-Action Dataset (A2D))
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/A2D_Sentences
下载链接
链接失效反馈官方服务:
资源简介:
Xu 等人的 Actor-Action 数据集 (A2D)。 [29] 作为一般演员和动作分割任务的最大视频数据集。它包含来自 YouTube 的 3,782 个视频,其中包含像素级标记的演员及其行为。该数据集包括八种不同的动作,而总共考虑了七个演员类别来执行这些动作。我们遵循 [29],将数据集分成 3,036 个训练视频和 746 个测试视频。由于我们对句子中的像素级演员和动作分割感兴趣,我们用自然语言描述每个演员在视频中所做的事情来增强 A2D 中的视频。遵循 [12] 中提出的指导方针,如果视频中考虑了多个对象,我们会要求我们的注释者对每个演员实例进行有区别的引用表达。注释过程总共产生了 6,656 个句子,包括 811 个不同的名词、225 个动词和 189 个形容词。我们的句子以更精细的粒度丰富了 A2D 数据集中的演员和动作对。例如,A2D 中的演员成人可能在我们的句子中标注了男人、女人、人物和玩家,而动作滚动也可能是指在不同场景中描述不同演员时的翻转、滑动、移动和奔跑。我们的句子平均包含比 ReferIt 数据集 [12] 更多的单词(7.3 vs 4.7),即使我们省略了介词、冠词和连接动词(4.5 vs 3.6)。这是有道理的,因为我们的句子包含各种动词,而现有的引用表达数据集大多忽略动词。
The Actor-Action dataset (A2D) proposed by Xu et al. [29] is the largest video dataset for the general actor and action segmentation task. It contains 3,782 videos collected from YouTube, with pixel-level annotated actors and their corresponding actions. This dataset encompasses eight distinct action categories, with a total of seven actor categories involved in performing these actions. We follow the experimental setup described in [29] to split the dataset into 3,036 training videos and 746 test videos. Given our focus on pixel-level actor and action segmentation grounded in natural language sentences, we augmented the video samples in A2D by describing each actor’s actions in the video with natural language descriptions. Following the guidelines proposed in [12], when multiple objects appear in a video, we require our annotators to generate distinct referring expressions for each actor instance. The annotation process yields a total of 6,656 sentences, including 811 distinct nouns, 225 verbs, and 189 adjectives. Our sentences enrich the actor-action pairs in the A2D dataset with finer granularity. For example, the "adult" actor category in the original A2D dataset may be annotated with terms such as "man", "woman", "person", and "player" in our sentences; similarly, the action category "rolling" may refer to actions like flipping, sliding, moving, and running when depicting different actors in diverse scenarios. Our sentences contain more words on average than the ReferIt dataset [12] (7.3 vs. 4.7), even when excluding prepositions, articles, and connecting verbs (4.5 vs. 3.6). This is reasonable, as our sentences incorporate a wide variety of verbs, while most existing referring expression datasets largely overlook verbs.
提供机构:
OpenDataLab
创建时间:
2022-08-19
搜集汇总
数据集介绍

背景与挑战
背景概述
A2D Sentences是一个文本指称表达分割数据集,基于Actor-Action Dataset (A2D)构建,包含3,782个YouTube视频的像素级演员和动作标记,并新增了6,656个自然语言句子来描述视频中的演员行为。这些句子以更精细的粒度丰富了原始数据集,平均包含7.3个单词,比类似数据集更复杂,支持演员和动作视频分割任务。
以上内容由遇见数据集搜集并总结生成



