Replication Data for: Summarizing First-Person Videos from Third Persons' Points of Views
收藏DataCite Commons2022-06-14 更新2025-04-16 收录
下载链接:
https://dataverse.lib.nycu.edu.tw/citation?persistentId=doi:10.57770/WBH60V
下载链接
链接失效反馈官方服务:
资源简介:
Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage. However, most existing studies rely on training data of third-person videos, which cannot easily generalize to highlight the first-person ones. With the goal of deriving an effective model to summarize first-person videos, we propose a novel deep neural network architecture for describing and discriminating vital spatiotemporal information across videos with different points of view. Our proposed model is realized in a semi-supervised setting, in which fully annotated third-person videos, unlabeled first-person videos, and a small number of annotated first-person ones are presented during training. In our experiments, qualitative and quantitative evaluations on both benchmarks and our collected first-person video datasets are presented.
视频高光提取与摘要生成是计算机视觉领域的重要研究课题,可赋能视频浏览、内容检索与存储优化等诸多应用场景。然而,现有绝大多数研究均以第三人称视角视频(third-person videos)作为训练数据,难以直接泛化适配至第一人称视角视频(first-person videos)的高光提取任务。为构建可有效生成第一人称视角视频摘要的模型,本研究提出一种新颖的深度神经网络架构,用于表征与区分不同视角视频中的关键时空信息。所提模型采用半监督学习范式,训练过程中可同时使用完全标注的第三人称视角视频、未标注的第一人称视角视频,以及少量标注的第一人称视角视频。实验部分将在公开基准数据集与我们自主采集的第一人称视角视频数据集上,同步开展定性与定量评估。
提供机构:
NYCU Dataverse
创建时间:
2022-06-14



