recogna-nlp/fakerecogna2-extrativa

Name: recogna-nlp/fakerecogna2-extrativa
Creator: recogna-nlp
Published: 2025-06-06 02:49:35
License: 暂无描述

Hugging Face2025-06-06 更新2025-07-05 收录

下载链接：

https://hf-mirror.com/datasets/recogna-nlp/fakerecogna2-extrativa

下载链接

链接失效反馈

官方服务：

资源简介：

FakeRecogna 2.0是一个用于假新闻检测的数据集扩展，包含来自巴西在线媒体和十个事实核查源的真实的和假的新闻文本。该数据集的一个重要特点是真实和假的新闻样本之间没有关联，以避免数据中的内在偏见。假新闻样本是从经过认证的巴西新闻网站收集的，而真实新闻则来自巴西知名媒体平台。真实新闻由于通常比假新闻内容更长，因此进行了文本摘要预处理。数据集包含了基于不同文本表示技术（如Bag of Words、TF-IDF、FastText、PTT5和BERTimbau）生成的特征向量。最后，数据集中的假新闻样本经过去重处理，形成了最终的26,400篇假新闻文章。

FakeRecogna 2.0 is an extension of the FakeRecogna dataset for fake news detection, containing real and fake news texts collected from online media and ten fact-checking sources in Brazil. An important characteristic of this dataset is the lack of relation between the real and fake news samples to avoid intrinsic bias in the data. The fake news samples were collected from verified Brazilian news websites, while real news were selected from well-known media platforms in Brazil. The real news went through text summarization preprocessing due to their typically longer content compared to fake news. The dataset includes feature vectors based on various text representation techniques such as Bag of Words, TF-IDF, FastText, PTT5, and BERTimbau. The final dataset consists of 26,400 unique fake news articles after duplicate removal.

提供机构：

recogna-nlp

5,000+

优质数据集

54 个

任务类型

进入经典数据集