MultiSubs: A Large-scale Multimodal and Multilingual Dataset
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5034604
下载链接
链接失效反馈官方服务:
资源简介:
MultiSubs is a dataset of multilingual subtitles gathered from the OPUS OpenSubtitles dataset, which in turn was sourced from opensubtitles.org. We have supplemented some text fragments (visually salient nouns in this release) within the subtitles with web images, where the word sense of the fragment has been disambiguated using a cross-lingual approach.
Please refer to our paper for a more detailed description of the dataset:
Josiah Wang, Pranava Madhyastha, Josiel Figueiredo, Chiraag Lala, Lucia Specia (2021). MultiSubs: A Large-scale Multimodal and Multilingual Dataset. CoRR, abs/2103.01910. Available at: https://arxiv.org/abs/2103.01910
创建时间:
2021-06-30



