TextVR
收藏arXiv2023-05-05 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2305.03347v1
下载链接
链接失效反馈官方服务:
资源简介:
TextVR是一个大规模跨模态视频检索数据集,包含42.2k句子查询和10.5k视频,覆盖8个场景领域。数据集要求模型同时理解视觉和文本语义,以进行视频检索。TextVR的创建涉及从YouTube和现有数据集收集视频,并通过专业团队进行标注。该数据集主要用于解决视频与语言领域的检索问题,特别是在需要文本阅读理解能力的场景中。
TextVR is a large-scale cross-modal video retrieval dataset containing 42.2k sentence queries and 10.5k videos spanning 8 scenario domains. This dataset requires models to comprehend both visual and textual semantics to conduct video retrieval tasks. The creation of TextVR involves collecting videos from YouTube and existing datasets, followed by annotation by professional teams. It is primarily designed to address retrieval problems in the video-and-language domain, especially in scenarios that require text comprehension and reading abilities.
提供机构:
浙江大学
创建时间:
2023-05-05



