Fake news

Name: Fake news
Creator: IEEE DataPort
Published: 2024-11-02 11:17:28
License: 暂无描述

DataCite Commons2024-11-02 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/fake-news

下载链接

链接失效反馈

官方服务：

资源简介：

Data will be collected from social media posts containing multimodal information such as text, image, and Comment relations, etc. Two fake news datasets in English and Chinese were sourced to train and test our proposed model. They are: - \textbf{Fakeddit}\footnote{https://github.com/entitize/Fakeddit}: an English multimodal dataset containing text, comments, and images from Reddit (n = 3,127). The samples within this dataset are labeled using a 2-way (real, fake) and a 3-way classification (real, fake with true text, fake with false text) scheme. Modelling was performed using both these classifications. - \textbf{Multimodal Fake News Detection dataset}\footnote{https://data.beijing.gov.cn/kjzy2020/index.html}(MFND): a multimodal Chinese dataset containing text, comments, and images from Weibo (n = 2,953). The dataset is categorized into three distinct classes: uncertain, fake news, and real news. The number of instances for each classified labels in the datasets are given in Table \ref{tab:dataset_overview}, indicating the datasets to be balanced. \begin{table}[ht] \centering \caption{Fake news datasets overview} \begin{tabular}{c|c|c|c|c} \toprule Dataset & 2-way & 3-way & n & Total \\ \midrule & - & real & 1,000 & \\ MFND& - & fake & 953 & 2,953\\ & & uncertain & 1,000 & \\ \midrule & real & real & 1,048 & \\ Fakeddit& fake & fake with true text & 1,060 & 3,127\\ & & fake with false text & 1,019 & \\ \bottomrule \end{tabular} \label{tab:dataset_overview} \end{table} Observing the two datasets, it becomes evident that the distribution of categories is relatively balanced, with no significant gaps between individual labels. This balanced distribution provides us with a more reliable and comprehensive foundation for accurate analysis and modeling. In the fake news dataset, each piece of news is composed of different modalities, namely posts, images, and comments. The following Figure \ref{fig:2} illustrates the overall distribution of these modalities. \begin{figure}[h] \centering \includegraphics[width=\linewidth]{fig/fig2.png} \caption{Distribution of modalities} \label{fig:2} \end{figure} Taking a sample from the dataset, as illustrated in Figure \ref{fig:3}, we observe a user posting a narrative along with an image, while numerous other users engage in comments. The semantic gap between these comments and the post is substantial, mostly comprising emotional opinions. Some users even verify the authenticity of the information, as evident in this example where a commenter claims to have seen related information on YouTube. Thus, we affirm the dataset's multimodal and semantic diversity. \begin{figure}[h] \centering \includegraphics[width=\linewidth]{fig/4.png} \caption{Example of dataset} \label{fig:3} \end{figure}

提供机构：

IEEE DataPort

创建时间：

2024-11-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集