clickbait-news-bg
收藏OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/clickbait-news-bg
下载链接
链接失效反馈官方服务:
资源简介:
This is a corpus of Bulgarian news over a fixed period of time, whose factuality had been questioned.
The news come from 377 different sources from various domains, including politics, interesting facts and tips&tricks.
The dataset was prepared for the Hack the
Fake News hackathon. It was provided by the
Bulgarian Association of PR Agencies and is
available in Gitlab.
The corpus was automatically collected, and then annotated by students of journalism.
The training dataset contains 2,815 examples, where 1,940 (i.e., 69%) are fake news
and 1,968 (i.e., 70%) are click-baits; There are 761 testing examples.
There is 98% correlation between fake news and clickbaits.
One important aspect about the training dataset is that it contains many repetitions.
This should not be surprising as it attempts to represent a natural distribution of factual
vs. fake news on-line over a period of time. As publishers of fake news often have a group of
websites that feature the same deceiving content, we should expect some repetition.
In particular, the training dataset contains
434 unique articles with duplicates. These articles have three reposts each on average, with
the most reposted article appearing 45 times.
If we take into account the labels of the reposted articles, we can see that if an article
is reposted, it is more likely to be fake news.
The number of fake news that have a duplicate in the training dataset are 1018 whereas,
the number of articles with genuine content
that have a duplicate article in the training set is 322.
提供机构:
OpenDataLab
创建时间:
2023-12-07



