recogna-nlp/fakerecogna2-abstrativa
收藏Hugging Face2025-06-06 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/recogna-nlp/fakerecogna2-abstrativa
下载链接
链接失效反馈官方服务:
资源简介:
FakeRecogna 2.0是一个用于假新闻检测的数据集扩展,包含了从巴西在线媒体和十个事实核查来源收集的真实和假新闻文本。数据集中的真实新闻和假新闻样本之间没有关联,以避免数据中的内在偏见。假新闻样本是从经过验证的巴西新闻网站收集的,而真实新闻则来自巴西知名媒体平台。真实新闻由于篇幅通常比假新闻内容长,因此经过了文本摘要预处理。数据集使用了多种文本表示方法,包括词袋(BoW)、词频-逆文档频率(TF-IDF)、FastText、PTT5和BERTimbau,以形成机器学习模型的输入特征向量。最终,数据集包含了26,400篇假新闻文章。
FakeRecogna 2.0 is an extension of the FakeRecogna dataset for fake news detection, including real and fake news texts collected from online media and ten fact-checking sources in Brazil. There is no relation between the real and fake news samples in the dataset to avoid intrinsic bias. The fake news samples were collected from verified Brazilian news websites, while real news came from well-known media platforms in Brazil. The real news, being typically longer than the fake content, was preprocessed with text summarization. Various text representation methods were used, including Bag of Words (BoW), Term Frequency – Inverse Document Frequency (TF-IDF), FastText, PTT5, and BERTimbau, to form the input feature vectors for ML models. The dataset finally consists of 26,400 fake news articles.
提供机构:
recogna-nlp



