Crossmodal-3600

Name: Crossmodal-3600
Creator: 谷歌研究院
Published: 2022-10-10 18:39:10
License: 暂无描述

arXiv2022-10-10 更新2024-06-21 收录

下载链接：

https://google.github.io/crossmodal-3600/

下载链接

链接失效反馈

官方服务：

资源简介：

Crossmodal-3600（简称XM3600）是由谷歌研究院创建的多语种多模态评估数据集，包含3600张来自全球各地的图片，每张图片都配有36种语言的人工生成参考标题。该数据集旨在为大规模多语种图像标注研究提供高质量的评估基准。数据集中的图片覆盖了36种语言所使用的地区，确保了文化多样性。通过精心设计的标注协议，避免了直接翻译带来的标注伪影，使得所有语言的标题风格保持一致。XM3600数据集的应用领域包括但不限于图像标注模型的评估和选择，以及为视觉障碍用户提供更好的可访问性解决方案。

Crossmodal-3600 (abbreviated as XM3600) is a multilingual multimodal evaluation dataset created by Google Research. It contains 3,600 images from regions across the globe, with each image paired with human-generated reference captions in 36 languages. This dataset aims to provide a high-quality evaluation benchmark for large-scale multilingual image captioning research. The images in the dataset cover the regions where the 36 languages are used, ensuring cultural diversity. Through a carefully designed annotation protocol, annotation artifacts caused by direct translation are avoided, and the caption styles across all languages remain consistent. Application scenarios of the XM3600 dataset include, but are not limited to, the evaluation and selection of image captioning models, as well as providing better accessibility solutions for visually impaired users.

提供机构：

谷歌研究院

创建时间：

2022-05-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集