"Dataset for TMED"
收藏DataCite Commons2026-04-19 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/dataset-tmed-0
下载链接
链接失效反馈官方服务:
资源简介:
"Datasets. We employ eight publicly available benchmark datasets spanning diverse domains and content types for cross-domain emerging topic rumor detection. Five datasets serve as source domains: FEVER, a large-scale fact verification dataset containing short declarative statements; GettingReal and GossipCop, which consist of full-length news articles collected from real-world news outlets; LIAR, comprising short political statements from PolitiFact; and PHEME, which contains social media posts from Twitter related to breaking news events. Three datasets are used as target domains to simulate emerging topics: CoAID, a COVID-19 healthcare misinformation dataset covering news articles and claims, notable for its highly imbalanced label distribution (over 90% non-rumor); Constraint, a COVID-19 fake news dataset collected from social media with a nearly balanced class distribution; and ANTiVax, a Twitter dataset focusing on COVID-19 vaccine misinformation. All datasets are binary-labeled as rumor or non-rumor. The datasets vary considerably in average text length (from 9.4 to 738.9 tokens), content type (statements, news articles, and social network posts), and class distribution, providing a comprehensive testbed for evaluating cross-domain generalization. Following prior work, each dataset is split into training, validation, and test sets at a 7:2:1 ratio, and 15 source\u2013target adaptation scenarios are constructed by pairing each source dataset with each target dataset."
提供机构:
IEEE DataPort
创建时间:
2026-04-19



