MLM (Multiple Languages and Modalities) dataset
收藏arXiv2020-09-05 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.3885753
下载链接
链接失效反馈官方服务:
资源简介:
MLM数据集是由波恩大学和德国莱布尼茨科学和科技信息中心等机构的研究人员共同创建的,包含236,000个样本,涵盖英语、法语和德语三种语言以及文本、图像、地理位置和知识图谱三元组等多种模态。数据集的创建过程利用了Wikidata知识图谱,确保了数据间语义关系的丰富性。该数据集主要应用于数字人文领域,旨在通过多模态框架评估文化和历史现象,同时也支持多模态表示学习、位置估计和场景理解等研究。
The MLM dataset was co-created by researchers from institutions including the University of Bonn and the Leibniz Information Centre for Science and Technology (TIB) in Germany, along with other relevant organizations. It contains 236,000 samples covering three languages (English, French and German) and multiple modalities, namely text, images, geographic location data, and knowledge graph triples. The dataset was constructed using the Wikidata knowledge graph, which ensures the richness of semantic relationships among the data entries. Primarily applied in the field of digital humanities, this dataset is designed to evaluate cultural and historical phenomena via multimodal frameworks, while also supporting research works such as multimodal representation learning, location estimation, and scene understanding.
提供机构:
波恩大学
创建时间:
2020-08-14



