five

Fake news

收藏
DataCite Commons2024-11-02 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/fake-news
下载链接
链接失效反馈
官方服务:
资源简介:
Data will be collected from social media posts containing multimodal information such as text, image, and Comment relations, etc. Two fake news datasets in English and Chinese were sourced to train and test our proposed model. They are: - \textbf{Fakeddit}\footnote{https://github.com/entitize/Fakeddit}: an English multimodal dataset containing text, comments, and images from Reddit (n = 3,127). The samples within this dataset are labeled using a 2-way (real, fake) and a 3-way classification (real, fake with true text, fake with false text) scheme. Modelling was performed using both these classifications. - \textbf{Multimodal Fake News Detection dataset}\footnote{https://data.beijing.gov.cn/kjzy2020/index.html}(MFND): a multimodal Chinese dataset containing text, comments, and images from Weibo (n = 2,953). The dataset is categorized into three distinct classes: uncertain, fake news, and real news. The number of instances for each classified labels in the datasets are given in Table \ref{tab:dataset_overview}, indicating the datasets to be balanced. \begin{table}[ht] \centering \caption{Fake news datasets overview} \begin{tabular}{c|c|c|c|c} \toprule Dataset & 2-way & 3-way & n & Total \\ \midrule & - & real & 1,000 & \\ MFND& - & fake & 953 & 2,953\\ & & uncertain & 1,000 & \\ \midrule & real & real & 1,048 & \\ Fakeddit& fake & fake with true text & 1,060 & 3,127\\ & & fake with false text & 1,019 & \\ \bottomrule \end{tabular} \label{tab:dataset_overview} \end{table} Observing the two datasets, it becomes evident that the distribution of categories is relatively balanced, with no significant gaps between individual labels. This balanced distribution provides us with a more reliable and comprehensive foundation for accurate analysis and modeling. In the fake news dataset, each piece of news is composed of different modalities, namely posts, images, and comments. The following Figure \ref{fig:2} illustrates the overall distribution of these modalities. \begin{figure}[h] \centering \includegraphics[width=\linewidth]{fig/fig2.png} \caption{Distribution of modalities} \label{fig:2} \end{figure} Taking a sample from the dataset, as illustrated in Figure \ref{fig:3}, we observe a user posting a narrative along with an image, while numerous other users engage in comments. The semantic gap between these comments and the post is substantial, mostly comprising emotional opinions. Some users even verify the authenticity of the information, as evident in this example where a commenter claims to have seen related information on YouTube. Thus, we affirm the dataset's multimodal and semantic diversity. \begin{figure}[h] \centering \includegraphics[width=\linewidth]{fig/4.png} \caption{Example of dataset} \label{fig:3} \end{figure}
提供机构:
IEEE DataPort
创建时间:
2024-11-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作