Combined rumor and non-rumor dataset

Name: Combined rumor and non-rumor dataset
Creator: IEEE DataPort
Published: 2025-03-31 07:33:26
License: 暂无描述

DataCite Commons2025-03-31 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/combined-rumor-and-non-rumor-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset, comprising 103,806 text entries, is a comprehensive resource for rumor detection on social media, constructed by merging benchmark collections including PHEME, LIAR Fake News, Twitter15, Twitter16, and ISOT Fake News. It features a binary classification schema (47% rumor, 53% non-rumor) and integrates original and adversarially augmented samples to enhance model robustness. Augmentation, applied selectively to the rumor class, employs the TextAttack framework with EmbeddingAugmenter (20% word swaps) and CharSwapAugmenter (character-level perturbations), preserving semantic integrity while introducing realistic textual variations. Preprocessing includes text normalization (e.g., lowercase conversion, URL/user placeholders)

该数据集包含103,806条文本条目，是社交媒体谣言检测的综合资源，通过融合PHEME、LIAR Fake News、Twitter15、Twitter16和ISOT Fake News等基准数据集构建而成。它采用二元分类方案（47%谣言、53%非谣言），并整合原始样本与对抗增强样本以提升模型鲁棒性。增强操作选择性地应用于谣言类别，采用TextAttack框架，结合EmbeddingAugmenter（20%词汇替换）和CharSwapAugmenter（字符级扰动），在引入真实文本变异的同时保持语义完整性。预处理步骤包括文本标准化（例如小写转换、URL/用户占位符替换）。

提供机构：

IEEE DataPort

创建时间：

2025-03-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集