ArtELingo
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/ArtELingo
下载链接
链接失效反馈官方服务:
资源简介:
本文介绍了ArtELingo,这是一个新的基准和数据集,旨在鼓励跨语言和文化的多样性工作。继ArtEmis之后,来自WikiArt的80k艺术品收藏有0.45万个情感标签和仅英文标题,ArtELingo又增加了0.79万个阿拉伯语和中文注释,加上4.8万个西班牙语注释,以评估 “文化转移” 的表现。超过51k的艺术品有3种语言的5个注释或更多。这种多样性使得研究跨语言和文化的异同成为可能。此外,我们研究了字幕任务,发现多样性提高了基线模型的性能。ArtELingo是公开可用的,具有标准拆分和基线模型。我们希望我们的工作将有助于简化未来对多语言和文化意识的人工智能的研究。
This paper introduces ArtELingo, a novel benchmark and dataset designed to foster cross-lingual and cross-cultural diversity research. Building upon ArtEmis, which features 80,000 artworks from WikiArt with 4,500 emotion labels and English-only captions, ArtELingo adds 7,900 Arabic and Chinese annotations, along with 48,000 Spanish annotations, to evaluate performance on "cultural transfer" tasks. Over 51,000 artworks have five or more annotations across three languages. This diversity facilitates investigations into cross-lingual and cross-cultural similarities and differences. Additionally, we investigate captioning tasks and find that this diversity improves the performance of baseline models. ArtELingo is publicly available, with standard data splits and baseline models provided. We hope that our work will help streamline future research on multilingual and culturally aware artificial intelligence.
提供机构:
OpenDataLab
创建时间:
2023-02-01
搜集汇总
数据集介绍

背景与挑战
背景概述
ArtELingo是一个多语言艺术注释数据集和基准,扩展了ArtEmis数据集,新增了阿拉伯语、中文和西班牙语注释,覆盖超过51k件艺术品。该数据集旨在促进跨语言和文化的多样性研究,并已应用于字幕任务,显示多样性提升模型性能,且公开可用。
以上内容由遇见数据集搜集并总结生成



