five

"Dataset for TMED"

收藏
DataCite Commons2026-04-19 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/dataset-tmed-0
下载链接
链接失效反馈
官方服务:
资源简介:
"Datasets. We employ eight publicly available benchmark datasets spanning diverse domains and content types for cross-domain emerging topic rumor detection. Five datasets serve as source domains: FEVER, a large-scale fact verification dataset containing short declarative statements; GettingReal and GossipCop, which consist of full-length news articles collected from real-world news outlets; LIAR, comprising short political statements from PolitiFact; and PHEME, which contains social media posts from Twitter related to breaking news events. Three datasets are used as target domains to simulate emerging topics: CoAID, a COVID-19 healthcare misinformation dataset covering news articles and claims, notable for its highly imbalanced label distribution (over 90% non-rumor); Constraint, a COVID-19 fake news dataset collected from social media with a nearly balanced class distribution; and ANTiVax, a Twitter dataset focusing on COVID-19 vaccine misinformation. All datasets are binary-labeled as rumor or non-rumor. The datasets vary considerably in average text length (from 9.4 to 738.9 tokens), content type (statements, news articles, and social network posts), and class distribution, providing a comprehensive testbed for evaluating cross-domain generalization. Following prior work, each dataset is split into training, validation, and test sets at a 7:2:1 ratio, and 15 source\u2013target adaptation scenarios are constructed by pairing each source dataset with each target dataset."
提供机构:
IEEE DataPort
创建时间:
2026-04-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作