Fake.Br Corpus
收藏数据集概述
数据集名称
- Fake.Br Corpus
数据集结构
-
full_texts 文件夹
- fake 文件夹: 包含收集的假新闻。
- true 文件夹: 包含收集的真实新闻。
- fake-meta-information 文件夹: 包含每条假新闻的元数据信息。
- true-meta-information 文件夹: 包含每条真实新闻的元数据信息。
元数据信息文件格式(逐行):
author link category date of publication number of tokens number of words without punctuation number of types number of links inside the news number of words in upper case number of verbs number of subjuntive and imperative verbs number of nouns number of adjectives number of adverbs number of modal verbs (mainly auxiliary verbs) number of singular first and second personal pronouns number of plural first personal pronouns number of pronouns pausality number of characters average sentence length average word length percentage of news with spelling errors emotiveness diversity
-
size_normalized_texts 文件夹
- 包含截断的文本,其中每对假-真新闻中较长文本被截断(按单词数量)至较短文本的大小。此版本的数据集可用于避免机器学习实验中的偏差。
引用信息
- 使用该数据集时,请引用以下文献:
-
PROPOR 2018会议论文:
Monteiro R.A., Santos R.L.S., Pardo T.A.S., de Almeida T.A., Ruiz E.E.S., Vale O.A. (2018) Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In: Villavicencio A. et al. (eds) Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science, vol 11122. Springer, Cham
-
Expert Systems with Applications论文:
Silva, Renato M., Santos R.L.S, Almeida T.A, and Pardo T.A.S. (2020) "Towards Automatically Filtering Fake News in Portuguese." Expert Systems with Applications, vol 146, p. 113199.
-




