TextCaps
收藏arXiv2020-08-04 更新2024-07-25 收录
下载链接:
https://textvqa.org/textcaps/
下载链接
链接失效反馈官方服务:
资源简介:
TextCaps是由Facebook AI Research创建的一个数据集,旨在通过阅读理解提高图像字幕的生成能力。该数据集包含28,408张图像的145,329条字幕,要求模型识别图像中的文本,并将其与视觉上下文相关联,以生成连贯的描述。TextCaps不仅挑战模型处理多文本令牌和视觉实体之间的空间、语义和视觉推理能力,还强调了在生成描述时对文本的直接复制或改写的重要性。该数据集的应用领域包括帮助视觉障碍人士理解图像内容,以及在视觉问答系统中提高对图像中文本的解读能力。
TextCaps is a dataset developed by Facebook AI Research, designed to enhance the generation of image captions through reading comprehension. This dataset comprises 145,329 captions for 28,408 images, requiring models to identify text within images and link it to visual context to produce coherent descriptions. TextCaps not only challenges models to perform spatial, semantic and visual reasoning between multiple text tokens and visual entities, but also highlights the significance of directly copying or paraphrasing text during caption generation. Application scenarios of this dataset include assisting visually impaired individuals to comprehend image content, and improving the text interpretation capability of images in visual question answering systems.
提供机构:
Facebook AI Research
创建时间:
2020-03-24



