A Multilingual & Multimodal Text and Image Corpus Dataset for Political Misinformation

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/x356jrj2cz

下载链接

链接失效反馈

官方服务：

资源简介：

Our database is a richly annotated multimodal database designed to facilitate strong fake-news detection research. It consists of two complementary but separate components: an image directory and a text spreadsheet. The image directory consists of a folder-level organization with a title as a topic; within each topic directory, the images are then placed in real and fake subdirectories based on the expert labeling. Such an organization allows loading and processing images for cross-modal testing or supervised learning. In contrast, text data are kept in a single Excel sheet where a record is one piece of news. Four separate columns keep the title, source, full news report, and real/fake indicator. Together, these modalities cover a broad range of temporal and topical domains not only social-media posts, mainstream-media news reports, and election-related posts but allowing the training of models on both linguistic aspects (sensational or objective tone, grammaticality, metadata quality) and visual aspects (original vs. photo-manipulated images). With a combination of a sparse folder hierarchy for images and a richly annotated spreadsheet for text, the dataset is well-specified, reproducible, and easy to pipe into any subsequent machine-learning pipeline.

创建时间：

2025-04-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集