D-DS-Fake experimental dataset
收藏DataCite Commons2026-03-25 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=1c3a637debab4f828c8ed030a4bb93e4
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is mainly used for multimodal false information recognition research and includes three publicly available benchmark datasets that have been uniformly cleaned and processed: Fakedit, GossipCop, and PolitiFact. The construction process of the dataset strictly follows scientific research standards. Firstly, the raw data index is obtained through official channels, and then the heterogeneous raw data is standardized using Python programming language and its data processing library. The processing steps include: text noise removal, unified timestamp format conversion, multi granularity label mapping, and social feature extraction. In terms of multimodal alignment, by parsing the multimedia links in the original metadata, it ensures that each text sample has a strict indexing relationship with the corresponding image URL. This dataset is stored in. csv format. Taking the fakeddit_train_clean. csv file as an example, the number of single table records is 29927. The meaning of the labels in each column of the data table is as follows: id is the unique identification code of the sample; Clean_title and title respectively represent the preprocessed and original news title text; Creatid_UTC is the news release time in UTC seconds; Domain records the domain name of the news source; HasImage is a Boolean field that indicates whether the sample contains image data; Image_url provides online storage addresses for corresponding multimodal images; Num_commends, score, and upvote_ratio represent key quantitative indicators such as interaction volume, score, and like ratio in social metadata; The labels and 2-way/3w_way/6w_way_1abels provide authenticity labels at different classification granularities. In terms of data quality, this dataset has excluded samples with severe text loss or link failure. However, due to the timeliness of some third-party social platform links, a few original image URLs may fail to obtain multimedia data due to the source address failure, which is a common defect of Internet dynamic data. It is recommended that researchers verify through our index file when replicating. There is no data tray or artificial masking in this dataset, and all processed non sensitive features are fully open. For the use of file formats, it is recommended that users use a standard text editor or Pandas library in Python environment for reading and analysis. For specific information, please refer to the README instruction file in this data package.
提供机构:
Science Data Bank
创建时间:
2026-03-25



