Combined rumor and non-rumor dataset
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/combined-rumor-and-non-rumor-dataset
下载链接
链接失效反馈官方服务:
资源简介:
This dataset, comprising 103,806 text entries, is a comprehensive resource for rumor detection on social media, constructed by merging benchmark collections including PHEME, LIAR Fake News, Twitter15, Twitter16, and ISOT Fake News. It features a binary classification schema (47% rumor, 53% non-rumor) and integrates original and adversarially augmented samples to enhance model robustness. Augmentation, applied selectively to the rumor class, employs the TextAttack framework with EmbeddingAugmenter (20% word swaps) and CharSwapAugmenter (character-level perturbations), preserving semantic integrity while introducing realistic textual variations. Preprocessing includes text normalization (e.g., lowercase conversion, URL/user placeholders)
提供机构:
Alohali, Mansor



