Hindi Visual Genome

Name: Hindi Visual Genome
Creator: 查尔斯大学数学与物理学院形式与应用语言学研究所
Published: 2019-07-21 18:00:28
License: 暂无描述

arXiv2019-07-21 更新2024-06-21 收录

下载链接：

http://hdl.handle.net/11234/1-2997

下载链接

链接失效反馈

官方服务：

资源简介：

Hindi Visual Genome是一个专为英语到印地语多模态机器翻译设计的数据集，由查尔斯大学数学与物理学院形式与应用语言学研究所创建。该数据集包含31525个条目，每个条目包括英文段落、对应的印地语翻译、相关图像及图像中的矩形区域。数据集的创建过程涉及自动翻译和人工后期编辑，确保翻译质量。此数据集主要应用于机器翻译研究，特别是在需要图像辅助以解决文本歧义的场景中，如在线新闻文章的图像标题翻译。

Hindi Visual Genome is a dataset specifically designed for English-to-Hindi multimodal machine translation, developed by the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University. This dataset contains 31,525 entries, each consisting of an English paragraph, its corresponding Hindi translation, the associated image, and rectangular regions within the image. The construction of the dataset involves automatic translation and manual post-editing to ensure translation quality. This dataset is primarily utilized in machine translation research, especially in scenarios where image assistance is needed to resolve textual ambiguities, such as image caption translation for online news articles.

提供机构：

查尔斯大学数学与物理学院形式与应用语言学研究所

创建时间：

2019-07-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集