A collection of original samples of real and fake news in English and Russian
收藏DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=3892eb5213394b7fad433156367dab54
下载链接
链接失效反馈官方服务:
资源简介:
The English corpus comes from the fake news dataset publicly released by Kaggle data analysis competition platform. The Russian corpus comes from the Russian Panorama News Network (Panorama), which focuses on current political news, and has been used by scholars for fake news-related research, with high credibility and representativeness; the real news corpus is collected from Interfax, one of the three most authoritative news agencies in Russia, and the representativeness and authenticity of the news corpus can be guaranteed, which also focuses on current political news. The news corpus can be guaranteed to be representative and authentic, also focusing on current affairs.The sample set divides the original English-Russian true and false news corpus into two parts: the analysis set is used to analyze the difference measures between false news and true news, and the test set is used for the final automatic clustering test, so as to examine the effect of commonality analysis. In the analysis set, there are 8 pairs of English-Russian true and false news with a total of 16 samples, FE1 to FE8 for English false news, TE1 to TE8 for true news; FR1 to FR8 for Russian false news, TR1 to TR8 for true news; in the test set, there are 5 pairs of English-Russian true and false news with a total of 10 samples, FE9 to FE13 for false news, TE9 to TE13 for true news; Russian false news with a total of 10 samples; Russian false news with a total of 5 pairs of English-Russian true and false news; Russian true news with a total of 10 samples. TE13; Russian fake news is FR9 to FR13, and real news is TR9 to TR13. each sample corpus of each language is of comparable size, which is convenient to carry out the combination comparison between real and fake news, and at the same time, in order to ensure the integrity of the textual structure and semantics, the fine-grained segmentation of the text of a single piece of news is not carried out.
提供机构:
Science Data Bank
创建时间:
2023-08-21



