TamperedNews & News400 Datasets (IJMIR'21 Update) BU
收藏Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://data.uni-hannover.de/dataset/3724ff7e-ca73-490e-8916-1130be809373
下载链接
链接失效反馈官方服务:
资源简介:
# Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency This repository contains the *TamperedNews* and *News400* datasets introduced in the paper: > Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, Sherzod Hakimov und Ralph Ewerth. „Multimodal news analytics using measures of cross-modal entity and context consistency“. In: _International Journal of Multimedia Information Retrieval_ 10.2 (2021), Springer, S. 111–125. DOI: https://doi.org/10.1007/s13735-021-00207-4 ## Content For both datasets *TamperedNews* and *News400*, we provide the: - ```*dataset*.tar.gz``` containing the ```*dataset*.jsonl``` with - Web links to the news texts - Web links to the news image - Outputs of the named entity recognition and disambiguation (NERD) approach - Untampered and tampered entities - ```*dataset*_vise_features.tar.gz```with visual features for events extracted from our event classification approach VisE presented at WACV'21 ([paper](https://openaccess.thecvf.com/content/WACV2021/html/Muller-Budack_Ontology-Driven_Event_Type_Classification_in_Images_WACV_2021_paper.html), [GitHub](https://github.com/TIBHannover/VisE)) Please note that the remaining visual features (```*dataset*_features.tar.gz```) and word embeddings (```*dataset*_wordembeddings.tar.gz```) have been already provided in the first version of both datasets ([News400](https://data.uni-hannover.de/dataset/news400), [TamperedNews](https://data.uni-hannover.de/dataset/tamperednews)). For all entities detected in both datasets, we provide: - ```entities.tar.gz``` containing an ```*entity_type*.jsonl``` for all entity types (events, locations, and persons) with: - Wikidata ID - Wikidata label - Meta information used for tampering - Web links to all reference images crawled from Google, Bing, and Wikidata - ```entities_features.tar.gz``` containing the visual features of the reference images for all entities ## Source Code The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
创建时间:
2024-01-31
搜集汇总
背景与挑战
背景概述
该数据集包括TamperedNews和News400两个部分,用于多模态新闻分析,重点关注跨模态实体一致性的度量,以检测新闻中的篡改行为。数据集提供新闻文本和图像的链接、实体识别输出、视觉特征以及实体参考图像,支持研究新闻真实性和多模态验证任务。
以上内容由遇见数据集搜集并总结生成



