MS-COCO/STAIR Comparable

Name: MS-COCO/STAIR Comparable
Creator: 佐治亚大学艺术与科学学院
Published: 2020-10-17 14:12:25
License: 暂无描述

arXiv2020-10-17 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2010.08725v1

下载链接

链接失效反馈

官方服务：

资源简介：

本研究提出的数据集名为MS-COCO/STAIR Comparable，由佐治亚大学艺术与科学学院等机构创建。该数据集包含123,287对英语和日语的比较句，这些句子来源于MS-COCO和STAIR图像标注数据集，描述同一图像但语言不同。数据集的创建过程涉及将两个数据集的训练和验证数据合并，并选择每张图像的一对英语和日语描述作为比较句。该数据集主要用于多模态神经机器翻译研究，旨在通过图像和文本的结合提高翻译质量，特别是在缺乏大规模平行语料的情况下。

The dataset proposed in this study is named MS-COCO/STAIR Comparable, which was created by institutions including the College of Arts and Sciences at the University of Georgia and other affiliated organizations. This dataset contains 123,287 pairs of comparable English and Japanese sentences, which are derived from the MS-COCO and STAIR image annotation datasets. These sentences describe the same image but in different languages. The construction process of this dataset involves merging the training and validation subsets of the two original datasets, and selecting one pair of English and Japanese descriptions for each image as the comparable sentence pairs. This dataset is primarily used for research on multimodal neural machine translation, aiming to improve translation quality by combining image and text information, especially in scenarios where large-scale parallel corpora are scarce.

提供机构：

佐治亚大学艺术与科学学院

创建时间：

2020-10-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集