Dataset for "NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos"

Name: Dataset for "NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos"
Creator: SMU Research Data Repository (RDR)
Published: 2023-10-31 00:00:00
License: 暂无描述

researchdata.smu.edu.sg2023-10-31 更新2025-01-15 收录

下载链接：

https://researchdata.smu.edu.sg/articles/dataset/Dataset_for_NPF-200_A_Multi-Modal_Eye_Fixation_Dataset_and_Method_for_Non-Photorealistic_Videos_/24447691/1

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos. The full code repository is available on GitHub https://github.com/Yangziyu/NPF200 Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies. This work aims to take a step forward to understand how humans perceive non-photorealistic videos with eye fixation (i.e., saliency detection), which is critical for enhancing media production, artistic design, and game user experience. To fill in the gap of missing a suitable dataset for this research line, we present NPF-200, the first large-scale multi-modal dataset of purely non-photorealistic videos with eye fixations. Our dataset has three characteristics: 1) it contains soundtracks that are essential according to vision and psychological studies; 2) it includes diverse semantic content and videos are of high-quality; 3) it has rich motions across and within videos. We conduct a series of analyses to gain deeper insights into this task and compare several state-of-the-art methods to explore the gap between natural images and non-photorealistic data. Additionally, as the human attention system tends to extract visual and audio features with different frequencies, we propose a universal frequency-aware multi-modal non-photorealistic saliency detection model called NPSNet, demonstrating the state-of-the-art performance of our task. The results uncover strengths and weaknesses of multi-modal network design and multi-domain training, opening up promising directions for future works. Our dataset and code can be found at https://github.com/Yangziyu/NPF200

《NPF-200 数据集：多模态眼动数据集与非真实感视频方法》本数据集的完整代码库可在 GitHub 上获取：https://github.com/Yangziyu/NPF200 随着元宇宙的兴起，非真实感视频的需求日益增长，然而，相关研究却相对匮乏。本研究旨在推动对该领域的发展，探索人类如何通过眼动（即显著度检测）感知非真实感视频，这对于提升媒体制作、艺术设计以及游戏用户体验至关重要。为填补该研究领域的空白，我们推出了 NPF-200，这是首个大规模多模态纯非真实感视频眼动数据集。我们的数据集具有以下三个特点：1）包含根据视觉和心理研究认为至关重要的音轨；2）涵盖了丰富的语义内容，视频质量上乘；3）视频内及视频间的运动丰富多样。我们进行了一系列分析，以深入了解该任务，并比较了多种最先进的方法，以探讨自然图像与非真实感数据之间的差距。此外，鉴于人类注意力系统倾向于以不同频率提取视觉和音频特征，我们提出了一种通用的频率感知多模态非真实感显著度检测模型，命名为 NPSNet，展示了我们在该任务上的顶尖性能。研究结果揭示了多模态网络设计和多领域训练的优势与不足，为未来的研究开辟了有前景的方向。我们的数据集和代码可在 https://github.com/Yangziyu/NPF200 上找到。

提供机构：

SMU Research Data Repository (RDR)