A Multilingual & Multimodal Text and Image Corpus Dataset for Political Misinformation
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/x356jrj2cz
下载链接
链接失效反馈官方服务:
资源简介:
Our database is a richly annotated multimodal database designed to facilitate strong fake-news detection research. It consists of two complementary but separate components: an image directory and a text spreadsheet. The image directory consists of a folder-level organization with a title as a topic; within each topic directory, the images are then placed in real and fake subdirectories based on the expert labeling. Such an organization allows loading and processing images for cross-modal testing or supervised learning. In contrast, text data are kept in a single Excel sheet where a record is one piece of news. Four separate columns keep the title, source, full news report, and real/fake indicator. Together, these modalities cover a broad range of temporal and topical domains not only social-media posts, mainstream-media news reports, and election-related posts but allowing the training of models on both linguistic aspects (sensational or objective tone, grammaticality, metadata quality) and visual aspects (original vs. photo-manipulated images). With a combination of a sparse folder hierarchy for images and a richly annotated spreadsheet for text, the dataset is well-specified, reproducible, and easy to pipe into any subsequent machine-learning pipeline.
创建时间:
2025-04-25



