Flickr30K-CFQ
收藏arXiv2024-04-01 更新2024-06-21 收录
下载链接:
https://sites.google.com/view/Flickr30K-cfq
下载链接
链接失效反馈官方服务:
资源简介:
Flickr30K-CFQ是一个专为文本-图像检索设计的新型数据集,由浙江实验室智能机器人研究中心创建。该数据集包含31,783张图片,旨在通过提供紧凑和碎片化的查询语句,模拟真实世界中的文本-图像检索任务。数据集内容丰富,包括多种查询粒度,如图像标签、短语、三元组和片段,以适应不同用户查询风格。创建过程中,利用了大型多模态模型LLaVA生成抽象图像描述,并结合StanfordNLP OPENIE组件提取三元组,进一步生成片段。Flickr30K-CFQ的应用领域广泛,主要用于多媒体信息检索、推荐系统和智能助手等领域,旨在解决现有数据集在真实文本-图像任务中的不足。
Flickr30K-CFQ is a novel dataset designed for text-image retrieval, created by the Intelligent Robotics Research Center of Zhejiang Lab. It contains 31,783 images, and aims to simulate real-world text-image retrieval tasks by providing compact and fragmented query statements. The dataset features rich content, covering multiple query granularities including image tags, phrases, triples and segments, to adapt to diverse user query styles. During the dataset construction process, the large multimodal model LLaVA was utilized to generate abstract image descriptions, and the StanfordNLP OPENIE component was employed to extract triples and further generate segments. Flickr30K-CFQ has a wide range of application scenarios, mainly including multimedia information retrieval, recommendation systems and intelligent assistants, aiming to address the shortcomings of existing datasets in real-world text-image retrieval tasks.
提供机构:
浙江实验室智能机器人研究中心
创建时间:
2024-03-20



