Multi30K
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Multi30K
下载链接
链接失效反馈官方服务:
资源简介:
Multi30K是Flickr30K数据集 (Young等人,2014) 的扩展,具有英语描述的31,014德语翻译和155,070独立收集的德语描述。翻译是从专业签约的翻译人员那里收集的,而描述是从未经培训的众筹人员那里收集的。这些语料库之间的关键区别在于不同语言的句子之间的关系。在翻译的语料库中,我们知道两种语言的句子之间有很强的对应关系。在描述语料库中,我们只知道句子,无论语言如何,都应该描述相同的图像。
Multi30K is an extension of the Flickr30K dataset (Young et al., 2014), which includes 31,014 German translations of English image descriptions and 155,070 independently collected German image descriptions. The translations were sourced from professionally contracted translators, while the descriptions were collected from untrained crowd-sourced workers. The core difference between these two corpora lies in the cross-lingual sentence correspondence constraints. For the translated corpus, there exists a robust one-to-one correspondence between English and German sentences. For the descriptive corpus, the only confirmed constraint is that all sentences, regardless of their language, must describe the same image.
提供机构:
OpenDataLab
创建时间:
2023-03-22
搜集汇总
数据集介绍

背景与挑战
背景概述
Multi30K是Flickr30K数据集的扩展,包含31,014个德语翻译和155,070个独立收集的德语描述,由专业翻译人员和众筹人员提供,特点是提供了不同语言句子之间的对应关系。
以上内容由遇见数据集搜集并总结生成



