News400 Dataset
收藏DataCite Commons2022-01-20 更新2025-04-15 收录
下载链接:
https://data.uni-hannover.de/dataset/c729ffd9-8be1-49a9-8c43-dab2f8a87753
下载链接
链接失效反馈官方服务:
资源简介:
# Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency This repository contains the *News400* dataset introduced in the paper: > Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, and Ralph Ewerth. 2020. Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 16–25. DOI: https://doi.org/10.1145/3372278.3390670 ## Content - **news400.tar.gz**: - ```dataset.jsonl``` containing: - Web links to the news texts - Web links to the news image - Outputs of the named entity recognition and disambiguation (NERD) approach - Untampered and tampered entities - ```.jsonl``` file for each entity type containing the following information for each entity: - Wikidata ID - Wikidata label - Meta information used for tampering - Web links to all reference images crawled from Google, Bing, and Wikidata - splits for testing and validation - **news400_features.tar.gz**: - Visual features of the news images for persons, locations, and scenes - Visual features of the reference images for persons, locations, and scenes - **news400_wordembeddings.tar.gz**: Word embeddings of all nouns in the news texts ## Source Code The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
提供机构:
LUIS
创建时间:
2020-06-04



