five

Dataset for "NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos"

收藏
DataCite Commons2023-10-31 更新2024-07-13 收录
下载链接:
https://researchdata.smu.edu.sg/articles/dataset/Dataset_for_NPF-200_A_Multi-Modal_Eye_Fixation_Dataset_and_Method_for_Non-Photorealistic_Videos_/24447691
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos. The full code repository is available on GitHub https://github.com/Yangziyu/NPF200 Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies. This work aims to take a step forward to understand how humans perceive non-photorealistic videos with eye fixation (i.e., saliency detection), which is critical for enhancing media production, artistic design, and game user experience. To fill in the gap of missing a suitable dataset for this research line, we present NPF-200, the first large-scale multi-modal dataset of purely non-photorealistic videos with eye fixations. Our dataset has three characteristics: 1) it contains soundtracks that are essential according to vision and psychological studies; 2) it includes diverse semantic content and videos are of high-quality; 3) it has rich motions across and within videos. We conduct a series of analyses to gain deeper insights into this task and compare several state-of-the-art methods to explore the gap between natural images and non-photorealistic data. Additionally, as the human attention system tends to extract visual and audio features with different frequencies, we propose a universal frequency-aware multi-modal non-photorealistic saliency detection model called NPSNet, demonstrating the state-of-the-art performance of our task. The results uncover strengths and weaknesses of multi-modal network design and multi-domain training, opening up promising directions for future works. Our dataset and code can be found at https://github.com/Yangziyu/NPF200

NPF-200数据集:面向非真实感视频(Non-Photorealistic Videos)的多模态眼动注视(eye fixation)数据集与研究方法。完整代码仓库已开源至GitHub:https://github.com/Yangziyu/NPF200。 随着元宇宙浪潮的兴起,非真实感视频的市场需求持续攀升,但该领域的相关研究仍存在显著缺口。本研究旨在深化人类对非真实感视频的眼动注视感知(即显著性检测(saliency detection))机制的理解,该机制对于优化媒体制作、艺术设计及游戏用户体验具有关键价值。 为填补该研究方向缺乏适配数据集的空白,我们构建了NPF-200——首个搭载眼动注视数据的纯非真实感视频大规模多模态数据集。本数据集具备三大核心特性:1)配备符合视觉与心理学研究标准的完整音轨;2)涵盖多样化语义内容,且视频画质精良;3)视频全局与局部均包含丰富的动态变化。 我们针对该任务开展了一系列分析以获取更深入的认知,并对比了多款当前最优(state-of-the-art)方法,以此探究自然图像与非真实感数据之间的性能差异。此外,鉴于人类注意力系统倾向于提取不同频率的视觉与听觉特征,我们提出了一种通用的频率感知多模态非真实感显著性检测模型NPSNet,该模型在本任务中展现出当前最优的性能。 实验结果揭示了多模态网络设计与多域训练的优势与局限,为后续研究指明了极具潜力的发展方向。本数据集与代码均可通过以下链接获取:https://github.com/Yangziyu/NPF200
提供机构:
SMU Research Data Repository (RDR)
创建时间:
2023-10-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作