MultiSubs
收藏arXiv2022-06-17 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.5034604
下载链接
链接失效反馈官方服务:
资源简介:
MultiSubs是一个大规模的多模态和多语言数据集,由帝国理工学院创建,旨在促进对词语在语言中上下文使用时与图像的视觉基础的研究。该数据集包含从电影字幕中选取的句子,这些句子通过图像来明确展示概念。数据集的特点包括:图像与文本片段而非整个句子对齐;一个文本片段或句子可能对应多个图像;句子形式自由,类似真实世界文本;平行文本具有多语言性。数据集通过一个填空游戏来评估自动图像选择过程的质量,并在两个自动任务上展示了其效用:填空和词汇翻译。结果表明,图像可以作为文本上下文的有用补充。MultiSubs特别适用于自由形式句子的词语视觉基础研究,可通过指定的URL访问,适用于研究目的。
MultiSubs is a large-scale multimodal and multilingual dataset developed by Imperial College London, which aims to facilitate research on the visual grounding of words in their contextual usage within language alongside corresponding images. This dataset comprises sentences selected from movie subtitles, where the concepts are explicitly illustrated via paired images. Key features of this dataset include: 1) images are aligned with text segments rather than full sentences; 2) a single text segment or sentence may correspond to multiple images; 3) sentences are in free form, consistent with real-world text; 4) parallel texts are multilingual. The dataset uses a fill-in-the-blank game to evaluate the quality of automated image selection processes, and demonstrates its effectiveness across two automatic tasks: fill-in-the-blank and lexical translation. Experimental results indicate that images can serve as a valuable supplement to textual context. MultiSubs is particularly well-suited for research on visual grounding of words in free-form sentences, and is accessible via a specified URL for research purposes.
提供机构:
帝国理工学院
创建时间:
2021-03-03



