PolyglotFakeFacts: A multilingual dataset of fake and real news across politics, security, and social domains
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/gff8bmr4ff
下载链接
链接失效反馈官方服务:
资源简介:
PolyglotFakeFacts is a multilingual dataset designed to support research on the detection of fake and real news across diverse domains such as politics, geopolitics, security, social issues, and military affairs.
The research hypothesis underpinning this dataset is that linguistic and contextual markers of misinformation can be systematically identified across multiple languages, enabling the development of more robust and generalizable fake news detection models.
The dataset shows a balanced collection of human-labeled fake news articles alongside verified real news extracted from trusted media outlets. Notably, the data covers multiple languages and different thematic areas, which allows researchers to explore how misinformation manifests in diverse cultural and geopolitical contexts.
Among the key findings is that fake news articles often display recurring linguistic and structural patterns regardless of the language, while real news tends to follow more standardized journalistic conventions. This suggests that multilingual approaches to fake news detection could leverage both cross-linguistic similarities and domain-specific features.
The data was gathered through a combination of manual annotation by human experts for fake news samples and curation of real news from reliable sources. All samples were pre-processed to ensure consistent formatting, removal of duplicates, and inclusion of metadata such as language, domain, and label (fake/real).
This dataset can be interpreted and used by researchers aiming to:
- train and evaluate machine learning and deep learning models for fake news classification,
- perform cross-lingual and multilingual comparative studies,
- investigate the linguistic, semantic, and thematic characteristics of misinformation.
By providing a curated, multilingual, and domain-diverse resource, PolyglotFakeFacts enables the community to develop more transparent, explainable, and resilient AI models for combating online misinformation.
创建时间:
2026-02-17



