five

Combined rumor and non-rumor dataset

收藏
DataCite Commons2025-03-31 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/combined-rumor-and-non-rumor-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset, comprising 103,806 text entries, is a comprehensive resource for rumor detection on social media, constructed by merging benchmark collections including PHEME, LIAR Fake News, Twitter15, Twitter16, and ISOT Fake News. It features a binary classification schema (47% rumor, 53% non-rumor) and integrates original and adversarially augmented samples to enhance model robustness. Augmentation, applied selectively to the rumor class, employs the TextAttack framework with EmbeddingAugmenter (20% word swaps) and CharSwapAugmenter (character-level perturbations), preserving semantic integrity while introducing realistic textual variations. Preprocessing includes text normalization (e.g., lowercase conversion, URL/user placeholders)

该数据集包含103,806条文本条目,是社交媒体谣言检测的综合资源,通过融合PHEME、LIAR Fake News、Twitter15、Twitter16和ISOT Fake News等基准数据集构建而成。它采用二元分类方案(47%谣言、53%非谣言),并整合原始样本与对抗增强样本以提升模型鲁棒性。增强操作选择性地应用于谣言类别,采用TextAttack框架,结合EmbeddingAugmenter(20%词汇替换)和CharSwapAugmenter(字符级扰动),在引入真实文本变异的同时保持语义完整性。预处理步骤包括文本标准化(例如小写转换、URL/用户占位符替换)。
提供机构:
IEEE DataPort
创建时间:
2025-03-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作