Flickr Image dataset
收藏www.kaggle.com2018-06-12 更新2025-01-08 收录
下载链接:
https://www.kaggle.com/hsankesara/flickr-image-dataset
下载链接
链接失效反馈官方服务:
资源简介:
The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects. While our baseline rivals in accuracy more complex state-of-the-art models, we show that its gains cannot be easily parlayed into improvements on such tasks as image-sentence retrieval, thus underlining the limitations of current methods and the need for further research.
Flickr30k数据集已成为基于句子的图像描述的标准基准。本文提出了Flickr30k实体数据集,该数据集通过增加244k个共指链,将Flickr30k中的158k个标题与同一图像的不同标题中提及的相同实体相联系,并将它们与276k个手动标注的边界框相关联,从而丰富了Flickr30k数据集。此类标注对于自动图像描述和基于知识的语言理解持续进步至关重要。它们使我们能够定义一个新的基准,用于图像中文本实体提及的定位。我们针对这一任务提出了一个强大的基线,该基线结合了图像-文本嵌入、常见物体检测器、颜色分类器以及倾向于选择较大物体的倾向。虽然我们的基线在准确性上与更复杂的最先进模型相媲美,但我们表明,其优势难以轻易转化为图像-句子检索等任务上的改进,从而凸显了当前方法的局限性以及进一步研究的必要性。
提供机构:
Kaggle



